Local Gain Adaptation in Stochastic Gradient Descent

Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton''s work on linear systems to the general, nonlinear case. The resulting online algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods, and show remarkable robustness when faced with non-i.i.d. sampling of the input space.

[1]  E. S. Plumer Training neural networks using sequential extended Kalman filtering , 1995 .

[2]  P. Strevens Iii , 1985 .

[3]  Mark Harmon Multi-player residual advantage learning with general function , 1996 .

[4]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[5]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[6]  Ralph Neuneier,et al.  How to Train Neural Networks , 1996, Neural Networks: Tricks of the Trade.

[7]  Frank Fallside,et al.  An adaptive training algorithm for back propagation networks , 1987 .

[8]  Sharad Singhal,et al.  Training Multilayer Perceptrons with the Extende Kalman Algorithm , 1988, NIPS.

[9]  Francesco Palmieri,et al.  Optimal filtering algorithms for fast learning in feedforward neural networks , 1992, Neural Networks.

[10]  Andreas Ziehe,et al.  Adaptive On-line Learning in Changing Environments , 1996, NIPS.

[11]  Nicol N. Schraudolph,et al.  A Fast, Compact Approximation of the Exponential Function , 1999, Neural Computation.

[12]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization in Large Adaptive Machines , 1992, NIPS.

[13]  W. Press,et al.  Numerical Recipes in Fortran: The Art of Scientific Computing.@@@Numerical Recipes in C: The Art of Scientific Computing. , 1994 .

[14]  Nicol N. Schraudolph,et al.  Online Local Gain Adaptation for Multi-Layer Perceptrons , 1998 .

[15]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[16]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[17]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[18]  Thibault Langlois,et al.  Parameter adaptation in stochastic optimization , 1999 .

[19]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[20]  Terrence J. Sejnowski,et al.  Tempering Backpropagation Networks: Not All Weights are Created Equal , 1995, NIPS.

[21]  Luís B. Almeida,et al.  Speeding up Backpropagation , 1990 .

[22]  Roberto Battiti,et al.  Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[23]  Lee A. Feldkamp,et al.  Decoupled extended Kalman filter training of feedforward layered networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[24]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[25]  Alan S. Lapedes,et al.  A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition , 1986 .

[26]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[27]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[28]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[29]  Guo-An Chen,et al.  Acceleration of backpropagation learning using optimised learning rate and momentum , 1993 .

[30]  Mance E. Harmon,et al.  Multi-Agent Residual Advantage Learning with General Function Approximation. , 1996 .

[31]  Martin A. Riedmiller,et al.  Advanced supervised learning in multi-layer perceptrons — From backpropagation to adaptive learning algorithms , 1994 .