Hardness Results for Neural Network Approximation Problems

We consider the problem of efficiently learning in two-layer neural networks. We show that it is NP-hard to find a linear threshold network of a fixed size that approximately minimizes the proportion of misclassified examples in a training set, even if there is a network that correctly classifies all of the training examples. In particular, for a training set that is correctly classified by some two-layer linear threshold network with k hidden units, it is NP-hard to find such a network that makes mistakes on a proportion smaller than c=k3 of the examples, for some constant c. We prove a similar result for the problem of approximately minimizing the quadratic loss of a two-layer network with a sigmoid output unit.

[1]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[2]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[3]  Leslie G. Valiant,et al.  Cryptographic limitations on learning Boolean formulae and finite automata , 1994, JACM.

[4]  Jacques Stern,et al.  The hardness of approximate optima in lattices, codes, and systems of linear equations , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[5]  L. K. Jones,et al.  The computational intractability of training sigmoidal neural networks , 1997, IEEE Trans. Inf. Theory.

[6]  Silvio Micali,et al.  How to construct random functions , 1986, JACM.

[7]  Sanjeev Khanna,et al.  On the Hardness of Approximating Max k-Cut and its Dual , 1997, Chic. J. Theor. Comput. Sci..

[8]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[9]  Franco P. Preparata,et al.  The Densest Hemisphere Problem , 1978, Theor. Comput. Sci..

[10]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[11]  Nimrod Megiddo,et al.  On the complexity of polyhedral separability , 1988, Discret. Comput. Geom..

[12]  ERIC B. BAUM,et al.  On learning a union of half spaces , 1990, J. Complex..

[13]  Hava T. Siegelmann,et al.  On the complexity of training neural networks with continuous activation functions , 1995, IEEE Trans. Neural Networks.

[14]  Van H. Vu On the Infeasibility of Training Neural Networks with Small Squared Errors , 1997, NIPS.

[15]  Mihalis Yannakakis,et al.  Optimization, approximation, and complexity classes , 1991, STOC '88.

[16]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[17]  Carsten Lund,et al.  On the hardness of approximating minimization problems , 1994, JACM.

[18]  András Faragó,et al.  Strong universal consistency of neural network classifiers , 1993, IEEE Trans. Inf. Theory.