论文信息 - Optimization schemes for neural networks

Optimization schemes for neural networks

Training neural networks need not be a slow, computationally expensive process. The reason it is seen as such might be the traditional emphasis on gradient descent for optimization. Conjugate gradient descent is an eecient optimization scheme for the weights of neural networks. This work includes an improvement to conjugate gradient descent that avoids line searches along the conjugate search directions. It makes use of a variant of backprop (Rumel-hart et al., 1986), called rbackprop (Pearlmutter, 1993), which can calculate the product of the Hessian of the weights and an arbitrary vector. The calculation is exact and computationally cheap. The report is in the nature of a tutorial. Gradient descent is reviewed and the back-propagation algorithm, used to nd the gradients, is derived. Then a number of alternative optimization strategies are described: Conjugate gradient descent Scaled conjugate gradient descent Delta-bar-delta RProp Quickprop All six optimization schemes are tested on various tasks and various types of networks. The results show that scaled conjugate gradient descent and quickprop are expedient optimization schemes for a variety of problems.

Wj Fitzgerald | Tt Jervis | W. Fitzgerald | T. Jervis

[1] Scott E. Fahlman,et al. An empirical study of learning speed in back-propagation networks , 1988 .

[2] Martin Fodslette Møller,et al. A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[3] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[4] H. H. Thodberg. Ace of Bayes : Application of Neural , 1993 .

[5] Robert A. Jacobs,et al. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[6] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[7] Franklin A. Graybill,et al. Introduction to The theory , 1974 .

[8] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[9] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[10] Wolfram Schiffmann,et al. Optimization of the Backpropagation Algorithm for Training Multilayer Perceptrons , 1994 .