Optimization for training neural nets

Various techniques of optimizing criterion functions to train neural-net classifiers are investigated. These techniques include three standard deterministic techniques (variable metric, conjugate gradient, and steepest descent), and a new stochastic technique. It is found that the stochastic technique is preferable on problems with large training sets and that the convergence rates of the variable metric and conjugate gradient techniques are similar.

[1]  Tamio Shimizu,et al.  A Stochastic Approximation Method for Optimization Problems , 1969, Journal of the ACM.

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  M. J. D. Powell,et al.  Restart procedures for the conjugate gradient method , 1977, Math. Program..

[4]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[5]  Gilbert Strang,et al.  Introduction to applied mathematics , 1988 .

[6]  Robert M. Farber,et al.  How Neural Nets Work , 1987, NIPS.

[7]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[8]  David J. Burr,et al.  Experiments on neural net recognition of spoken and written text , 1988, IEEE Trans. Acoust. Speech Signal Process..

[9]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[10]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[11]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[12]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[13]  P. J. Werbos,et al.  Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.

[14]  Sharad Singhal,et al.  Training Multilayer Perceptrons with the Extende Kalman Algorithm , 1988, NIPS.

[15]  Allen I. Selverston,et al.  A consideration of invertebrate central pattern generators as computational data bases , 1988, Neural Networks.

[16]  Etienne Barnard,et al.  A comparison between criterion functions for linear classifiers, with an application to neural nets , 1989, IEEE Trans. Syst. Man Cybern..

[17]  A. Owens,et al.  Efficient training of the backpropagation network by solving a system of stiff ordinary differential equations , 1989, International 1989 Joint Conference on Neural Networks.

[18]  D. Casasent,et al.  Image processing for image understanding with neural nets , 1989, International 1989 Joint Conference on Neural Networks.

[19]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..