On Natural Learning and Pruning in Multilayered Perceptrons

Several studies have shown that natural gradient descent for on-line learning is much more efficient than standard gradient descent. In this article, we derive natural gradients in a slightly different manner and discuss implications for batch-mode learning and pruning, linking them to existing algorithms such as Levenberg-Marquardt optimization and optimal brain surgeon. The Fisher matrix plays an important role in all these algorithms. The second half of the article discusses a layered approximation of the Fisher matrix specific to multilayered perceptrons. Using this approximation rather than the exact Fisher matrix, we arrive at much faster natural learning algorithms and more robust pruning procedures.

[1]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[2]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[3]  T. Hondou,et al.  Analysis of Learning Processes of Chaotic Time Series by Neural Networks , 1994 .

[4]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[5]  Saad,et al.  Exact solution for on-line learning in multilayer neural networks. , 1995, Physical review letters.

[6]  L. K. Hansen,et al.  Pruning with generalization based weight saliencies: λOBD, λOBS , 1995, NIPS 1995.

[7]  Martin A. Riedmiller,et al.  Fast Network Pruning and Feature Extraction by using the Unit-OBS Algorithm , 1996, NIPS.

[8]  Tom Heskes,et al.  How Dependencies between Successive Examples Affect On-Line Learning , 1996, Neural Computation.

[9]  Shun-ichi Amari,et al.  The Efficiency and the Robustness of Natural Gradient Descent Learning Rule , 1997, NIPS.

[10]  Giovanna Castellano,et al.  An iterative pruning algorithm for feedforward neural networks , 1997, IEEE Trans. Neural Networks.

[11]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[12]  T. Heskes,et al.  Learning in two-layered networks with correlated examples , 1997 .

[13]  Arie Hasman,et al.  Assessing the importance of features for multi-layer perceptrons , 1998, Neural Networks.

[14]  Magnus Rattray,et al.  Natural gradient descent for on-line learning , 1998 .

[15]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[16]  John G. Taylor,et al.  Dynamics of multilayer networks in the vicinity of temporary minima , 1999, Neural Networks.

[17]  Tom Heskes,et al.  Pruning Using Parameter and Neuronal Metrics , 1999, Neural Computation.

[18]  S. Amari Natural Gradient Works Eciently in Learning , 2022 .