论文信息 - Ill-Conditioning in Neural Network Training Problems

Ill-Conditioning in Neural Network Training Problems

The training problem for feedforward neural networks is nonlinear parameter estimation that can be solved by a variety of optimization techniques. Much of the literature on neural networks has focused on variants of gradient descent. The training of neural networks using such techniques is known to be a slow process with more sophisticated techniques not always performing significantly better. This paper shows that feedforward neural networks can have ill-conditioned Hessians and that this ill-conditioning can be quite common. The analysis and experimental results in this paper lead to the conclusion that many network training problems are ill conditioned and may not be solved more efficiently by higher-order optimization methods. While the analyses used in this paper are for completely connected layered networks, they extend to networks with sparse connectivity as well. The results suggest that neural networks can have considerable redundancy in parameterizing the function space in a neighborhood of a lo...

[1] James M. Ortega,et al. Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[2] E. Polak. Introduction to linear and nonlinear programming , 1973 .

[3] M. J. D. Powell,et al. Restart procedures for the conjugate gradient method , 1977, Math. Program..

[4] Jorge J. Moré,et al. The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .

[5] Philip E. Gill,et al. Practical optimization , 1981 .

[6] John E. Dennis,et al. Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[7] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[8] C. Fraley. Solution of nonlinear least-squares problems , 1987 .

[9] M. J. D. Powell,et al. Radial basis functions for multivariable interpolation: a review , 1987 .

[10] Terrence J. Sejnowski,et al. Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[11] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[12] M. Buhmann. Multivariate interpolation in odd-dimensional euclidean spaces using multiquadrics , 1990 .

[13] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.