Online Learning with Adaptive Local Step Sizes

Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton’s work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods as well as stochastic gradient descent with fixed learning rate and momentum.

[1]  Frank Fallside,et al.  An adaptive training algorithm for back propagation networks , 1987 .

[2]  Nicol N. Schraudolph,et al.  Online Local Gain Adaptation for Multi-Layer Perceptrons , 1998 .

[3]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[4]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization in Large Adaptive Machines , 1992, NIPS.

[5]  Sharad Singhal,et al.  Training Multilayer Perceptrons with the Extende Kalman Algorithm , 1988, NIPS.

[6]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[7]  Nicol N. Schraudolph,et al.  A Fast, Compact Approximation of the Exponential Function , 1999, Neural Computation.

[8]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[9]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[10]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[11]  Terrence J. Sejnowski,et al.  Tempering Backpropagation Networks: Not All Weights are Created Equal , 1995, NIPS.

[12]  Luís B. Almeida,et al.  Speeding up Backpropagation , 1990 .

[13]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[14]  Thibault Langlois,et al.  Parameter adaptation in stochastic optimization , 1999 .

[15]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[16]  Ralph Neuneier,et al.  How to Train Neural Networks , 1996, Neural Networks: Tricks of the Trade.

[17]  William H. Press,et al.  Numerical recipes in C++: the art of scientific computing, 2nd Edition (C++ ed., print. is corrected to software version 2.10) , 1994 .

[18]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[19]  Andreas Ziehe,et al.  Adaptive On-line Learning in Changing Environments , 1996, NIPS.

[20]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .