On the Problem of Local Minima in Backpropagation

The authors propose a theoretical framework for backpropagation (BP) in order to identify some of its limitations as a general learning procedure and the reasons for its success in several experiments on pattern recognition. The first important conclusion is that examples can be found in which BP gets stuck in local minima. A simple example in which BP can get stuck during gradient descent without having learned the entire training set is presented. This example guarantees the existence of a solution with null cost. Some conditions on the network architecture and the learning environment that ensure the convergence of the BP algorithm are proposed. It is proven in particular that the convergence holds if the classes are linearly separable. In this case, the experience gained in several experiments shows that multilayered neural networks (MLNs) exceed perceptrons in generalization to new examples. >

[1]  Richard Bellman,et al.  Introduction to Matrix Analysis , 1972 .

[2]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[3]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  Y. L. Cun Learning Process in an Asymmetric Threshold Network , 1986 .

[6]  Yann LeCun,et al.  Learning processes in an asymmetric threshold network , 1986 .

[7]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[8]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[9]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[10]  Richard Lippmann,et al.  Neural Net and Traditional Classifiers , 1987, NIPS.

[11]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[12]  Lawrence D. Jackel,et al.  VLSI implementation of a neural network model , 1988, Computer.

[13]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[14]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[15]  P. J. Werbos,et al.  Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.

[16]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[17]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .

[18]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[19]  Eduardo D. Sontag,et al.  Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..

[20]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[21]  J. Slawny,et al.  Back propagation fails to separate where perceptrons succeed , 1989 .

[22]  M. Gori,et al.  BPS: a learning algorithm for capturing the dynamic nature of speech , 1989, International 1989 Joint Conference on Neural Networks.

[23]  Sontag,et al.  Backpropagation separates when perceptrons do , 1989 .

[24]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[25]  E. K. Blum,et al.  Approximation of Boolean Functions by Sigmoidal Networks: Part I: XOR and Other Two-Variable Functions , 1989, Neural Computation.

[26]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[27]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[28]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[29]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[30]  Piero Cosi,et al.  Phonetically-based multi-layered neural networks for vowel classification , 1990, Speech Commun..

[31]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[32]  J. Reggia Book review: An Introduction to Neural and Electronic Networks. Edited by Steven E Zornetzer, Joel L. Davis, and Clifford Lau (Academic Press) , 1990, SGAR.

[33]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..