Optimization schemes for neural networks

Training neural networks need not be a slow, computationally expensive process. The reason it is seen as such might be the traditional emphasis on gradient descent for optimization. Conjugate gradient descent is an eecient optimization scheme for the weights of neural networks. This work includes an improvement to conjugate gradient descent that avoids line searches along the conjugate search directions. It makes use of a variant of backprop (Rumel-hart et al., 1986), called rbackprop (Pearlmutter, 1993), which can calculate the product of the Hessian of the weights and an arbitrary vector. The calculation is exact and computationally cheap. The report is in the nature of a tutorial. Gradient descent is reviewed and the back-propagation algorithm, used to nd the gradients, is derived. Then a number of alternative optimization strategies are described: Conjugate gradient descent Scaled conjugate gradient descent Delta-bar-delta RProp Quickprop All six optimization schemes are tested on various tasks and various types of networks. The results show that scaled conjugate gradient descent and quickprop are expedient optimization schemes for a variety of problems.