Error surfaces for multilayer perceptrons

Characteristics of error surfaces for the multilayer perceptron neural network that help explain why learning techniques that use hill-climbing methods are so slow in these networks and also provide insights into techniques to speed learning are examined. First, the surface has a stair-step appearance with many very flat and very steep regions. When the number of training samples is small there is often a one-to-one correspondence between individual training samples and the steps on the surface. As the number of samples increases, the surface becomes smoother. In addition the surface has flat regions that extend to infinity in all directions, making it dangerous to apply learning algorithms that perform line searches. The magnitude of the gradients on the surface strongly supports the need for floating-point representations during learning. The consequences of various weight initialization techniques are also discussed. >

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[3]  R. Raghavan,et al.  Gradient descent fails to separate , 1988, IEEE 1988 International Conference on Neural Networks.

[4]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[5]  B. Irie,et al.  Capabilities of three-layered perceptrons , 1988, IEEE 1988 International Conference on Neural Networks.

[6]  J. Stephen Judd,et al.  On the complexity of loading shallow neural networks , 1988, J. Complex..

[7]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[8]  Yu He,et al.  Asymptotic Convergence of Backpropagation , 1989, Neural Computation.

[9]  Tariq Samad,et al.  Effect of initial weights on back-propagation and its variations , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.