Fast Second-Order Gradient Descent via O(n) Curvature Matrix-Vector Products

We propose a generic method for iteratively approximating various second-order gradient steps - Newton, Gauss-Newton, Levenberg-Marquardt, and natural gradient - in linear time per iteration, using special curvature matrix-vector products that can be computed in O(n). Two recent acceleration techniques for online learning, matrix momentum and stochastic meta-descent (SMD), in fact implement this approach. Since both were originally derived by very different routes, this offers fresh insight into their operation, resulting in further improvements to SMD.

[1]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[2]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[3]  P. J. Werbos,et al.  Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.

[4]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[5]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[6]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[7]  Terrence J. Sejnowski,et al.  Tempering Backpropagation Networks: Not All Weights are Created Equal , 1995, NIPS.

[8]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[9]  Todd K. Leen,et al.  Using Curvature Information for Fast Stochastic Search , 1996, NIPS.

[10]  Mark Harmon Multi-player residual advantage learning with general function , 1996 .

[11]  Nicol N. Schraudolph Online Learning with Adaptive Local Step Sizes , 1999 .

[12]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[13]  Nicol N. Schraudolph,et al.  Online Independent Component Analysis with Local Learning Rate Adaptation , 1999, NIPS.

[14]  M. Rattray,et al.  MATRIX MOMENTUM FOR PRACTICAL NATURAL GRADIENT LEARNING , 1999 .

[15]  Gavin C. Cawley,et al.  On a Fast, Compact Approximation of the Exponential Function , 2000, Neural Computation.

[16]  S. Amari Natural Gradient Works Eciently in Learning , 2022 .