Dynamic Behavior of Constained Back-Propagation Networks

The learning dynamics of the back-propagation algorithm are investigated when complexity constraints are added to the standard Least Mean Square (LMS) cost function. It is shown that loss of generalization performance due to overtraining can be avoided when using such complexity constraints. Furthermore, "energy," hidden representations and weight distributions are observed and compared during learning. An attempt is made at explaining the results in terms of linear and non-linear effects in relation to the gradient descent learning algorithm.

[1]  F. Vallet,et al.  Solving the problem of overfitting of the pseudo-inverse solution for classification learning , 1989, International 1989 Joint Conference on Neural Networks.

[2]  Yves Chauvin,et al.  Generalization Performance of Overtrained Back-Propagation Networks , 1990, EURASIP Workshop.

[3]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[6]  Chuanyi Ji,et al.  Generalizing Smoothness Constraints from Discrete Samples , 1990, Neural Computation.

[7]  A. Sommerfeld Partial Differential Equations in Physics , 1949 .

[8]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[9]  James L. McClelland,et al.  James L. McClelland, David Rumelhart and the PDP Research Group, Parallel distributed processing: explorations in the microstructure of cognition . Vol. 1. Foundations . Vol. 2. Psychological and biological models . Cambridge MA: M.I.T. Press, 1987. , 1989, Journal of Child Language.

[10]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[11]  M. Ishikawa,et al.  A structural learning algorithm with forgetting of link weights , 1989, International 1989 Joint Conference on Neural Networks.

[12]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[13]  P. Erdös,et al.  Interpolation , 1953, An Introduction to Scientific, Symbolic, and Graphical Computation.