Backpropagation is Sensitive to Initial Conditions

This paper explores the effect of initial weight selection on feed-forward networks learning simple functions with the back-propagation technique. We first demonstrate, through the use of Monte Carlo techniques, that the magnitude of the initial condition vector (in weight space) is a very significant parameter in convergence time variability. In order to further understand this result, additional deterministic experiments were performed. The results of these experiments demonstrate the extreme sensitivity of back propagation to initial weight configuration.

[1]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[2]  B. Huberman,et al.  Chaotic states and routes to chaos in the forced pendulum , 1982 .

[3]  B. Huberman,et al.  Dynamic behavior of nonlinear networks , 1983 .

[4]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[5]  Tad Hogg,et al.  Phase Transitions in Artificial Intelligence Systems , 1987, Artif. Intell..

[6]  R. Westervelt,et al.  Dynamics of simple electronic neural networks , 1987 .

[7]  Meir,et al.  Chaotic behavior of a layered neural network. , 1988, Physical review. A, General physics.

[8]  Riedel,et al.  Temporal sequences and chaos in neural nets. , 1988, Physical review. A, General physics.

[9]  Michael F. Barnsley,et al.  Fractals everywhere , 1988 .

[10]  J. F. Kolen Faster learning through a probabilistic approximation algorithm , 1988, IEEE 1988 International Conference on Neural Networks.

[11]  J. Stephen Judd,et al.  On the complexity of loading shallow neural networks , 1988, J. Complex..

[12]  Sommers,et al.  Chaos in random neural networks. , 1988, Physical review letters.

[13]  Steve Renals,et al.  Chaos in Neural Networks , 1990, EURASIP Workshop.

[14]  John F. Kolen,et al.  Learning in parallel distributed processing networks: Computational complexity and information content , 1991, IEEE Trans. Syst. Man Cybern..

[15]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.