论文信息 - Exact Calculation of the Product of the Hessian Matrix of Feed-Forward Network Error Functions and a Vector in 0(N) Time

Exact Calculation of the Product of the Hessian Matrix of Feed-Forward Network Error Functions and a Vector in 0(N) Time

Several methods for training feed-forward neural networks require second order information from the Hessian matrix of the error function. Although it is possible to calculate the Hessian matrix exactly it is often not desirable because of the computation and memory requirements involved. Some learning techniques do, however, only need the Hessian matrix times a vector. This paper presents a method to calculate the Hessian matrix times a vector in O(N) time, where N is the number of variables in the network. This is the same order as the calculation of the gradient to the error function. The usefulness of this algorithm is demonstrated by improvement of existing learning techniques.

M. F. Møller | M. Møller

[1] Martin Fodslette Møller,et al. Supervised Learning On Large Redundant Training Sets , 1993, Int. J. Neural Syst..

[2] T. Yoshida. A learning algorithm for multilayered neural networks: a Newton method using automatic differentiation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[3] Chris Bishop,et al. Current address: Microsoft Research, , 2022 .

[4] Martin Fodslette Møller,et al. A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[5] M. Møller. A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1990 .

[6] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[7] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[8] D. Mackay,et al. A Practical Bayesian Framework for Backprop Networks , 1991 .

[9] L. Dixon,et al. Truncated Newton method for sparse unconstrained optimization using automatic differentiation , 1989 .

[10] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[11] M. A. Wolfe. A first course in numerical analysis , 1972 .

[12] Kanter,et al. Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.