On the complexity of training neural networks with continuous activation functions

Deals with computational issues of loading a fixed-architecture neural network with a set of positive and negative examples. This is the first result on the hardness of loading a simple three-node architecture which does not consist of the binary-threshold neurons, but rather utilizes a particular continuous activation function, commonly used in the neural-network literature. The authors observe that the loading problem is polynomial-time if the input dimension is constant. Otherwise, however, any possible learning algorithm based on particular fixed architectures faces severe computational barriers. Similar theorems have already been proved by Megiddo and by Blum and Rivest, to the case of binary-threshold networks only. The authors' theoretical results lend further suggestion to the use of incremental (architecture-changing) techniques for training networks rather than fixed architectures. Furthermore, they imply hardness of learnability in the probably approximately correct sense as well.

[1]  Saburo Muroga,et al.  Threshold logic and its applications , 1971 .

[2]  John T. Gill,et al.  Computational complexity of probabilistic Turing machines , 1974, STOC '74.

[3]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[4]  M. Garey Johnson: computers and intractability: a guide to the theory of np- completeness (freeman , 1979 .

[5]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[6]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[7]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[8]  Joseph W. Goodman,et al.  On the power of neural networks for solving hard problems , 1990, J. Complex..

[9]  R. Lippmann,et al.  An introduction to computing with neural nets , 1988, IEEE ASSP Magazine.

[10]  J. R. Brown,et al.  Artificial neural network on a SIMD architecture , 1988, Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation.

[11]  Nimrod Megiddo,et al.  On the complexity of polyhedral separability , 1988, Discret. Comput. Geom..

[12]  J. Stephen Judd,et al.  On the complexity of loading shallow neural networks , 1988, Journal of Complexity.

[13]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[14]  Mihalis Yannakakis,et al.  On the complexity of local search , 1990, STOC '90.

[15]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[16]  Hans Ulrich Simon,et al.  On learning ring-sum-expansions , 1990, COLT '90.

[17]  Roy Batruni A multilayer neural network with piecewise-linear structure and back-propagation learning , 1991, IEEE Trans. Neural Networks.

[18]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[19]  Georg Schnitger,et al.  On the computational power of sigmoid versus Boolean threshold circuits , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[20]  Xin Yao Finding Approximate Solutions to NP-Hard Problems by Neural Networks is Hard , 1992, Inf. Process. Lett..

[21]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, Annual Conference Computational Learning Theory.

[22]  Xiao-Dong Zhang Complexity Of Neural Network Learning In The Real Number Model , 1992, Workshop on Physics and Computation.

[23]  Hava T. Siegelmann,et al.  Neural Networks With Real Weights: Analog Computational Complexity , 1992 .

[24]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[25]  Georg Schnitger,et al.  The Power of Approximation: A Comparison of Activation Functions , 1992, NIPS.

[26]  Eduardo D. Sontag,et al.  Feedforward Nets for Interpolation and Classification , 1992, Journal of computer and system sciences (Print).

[27]  Eduardo D. Sontag,et al.  Rate of approximation results motivated by robust neural network learning , 1993, COLT '93.

[28]  Paul W. Goldberg,et al.  Bounding the Vapnik-Chervonenkis dimension of concept classes parameterized by real numbers , 1993, COLT '93.

[29]  Wolfgang Maass,et al.  Bounds for the computational power and learning complexity of analog neural nets , 1993, STOC '93.

[30]  Eduardo D. Sontag,et al.  Finiteness results for sigmoidal “neural” networks , 1993, STOC '93.

[31]  Klaus-Uwe Höffgen,et al.  Computational Limitations on Training Sigmoid Neural Networks , 1993, Information Processing Letters.

[32]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[33]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..