论文信息 - Fast Curvature Matrix-Vector Products

Fast Curvature Matrix-Vector Products

The Gauss-Newton approximation of the Hessian guarantees positive semi-definiteness while retaining more second-order information than the Fisher information.We extend it from nonlinear least squares to all differentiable objectives such that positive semi-definiteness is maintained for the standard loss functions in neural network regression and classification. We give efficient algorithms for computing the product of extended Gauss-Newton and Fisher information matrices with arbitrary vectors, using techniques similar to but even cheaper than the fast Hessian-vector product [1]. The stability of SMD [2,3,4,5], a learning rate adaptation method that uses curvature matrix-vector products, improves when the extended Gauss-Newton matrix is substituted for the Hessian.

Nicol N. Schraudolph

[1] N. Schraudolph. Fast Second-Order Gradient Descent via O(n) Curvature Matrix-Vector Products , 2000 .

[2] Sharad Singhal,et al. Training Multilayer Perceptrons with the Extende Kalman Algorithm , 1988, NIPS.

[3] Mance E. Harmon,et al. Multi-Agent Residual Advantage Learning with General Function Approximation. , 1996 .

[4] Nicol N. Schraudolph. Online Learning with Adaptive Local Step Sizes , 1999 .

[5] Nicol N. Schraudolph,et al. A Fast, Compact Approximation of the Exponential Function , 1999, Neural Computation.

[6] Manfred K. Warmuth,et al. Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[7] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[8] Nicol N. Schraudolph,et al. Online Independent Component Analysis with Local Learning Rate Adaptation , 1999, NIPS.

[9] Christopher M. Bishop,et al. Neural networks for pattern recognition , 1995 .

[10] Shun-ichi Amari,et al. Differential-geometrical methods in statistics , 1985 .

[11] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[12] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.