On Centering Neural Network Weight Updates ?

It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals (Schraudolph and Sejnowski, 1996). Here we generalize this notion to all factors involved in the weight update, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning signiicantly without adversely aaecting the trained network's generalization ability.

[1]  B. Widrow,et al.  Stationary and nonstationary learning characteristics of the LMS adaptive filter , 1976, Proceedings of the IEEE.

[2]  T. Sejnowski,et al.  Storing covariance with nonlinearly interacting neurons , 1977, Journal of mathematical biology.

[3]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[4]  Alan S. Lapedes,et al.  A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition , 1986 .

[5]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[6]  Roberto Battiti,et al.  Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[7]  Kanter,et al.  Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.

[8]  Nathan Intrator,et al.  Feature Extraction Using an Unsupervised Neural Network , 1992, Neural Computation.

[9]  Terrence J. Sejnowski,et al.  Unsupervised Discrimination of Clustered Data via Optimization of Binary Information Gain , 1992, NIPS.

[10]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[11]  Peter D. Turney Exploiting Context When Learning to Classify , 1993, ECML.

[12]  Michael Finke,et al.  Estimating A-Posteriori Probabilities using Stochastic Network Models , 1993 .

[13]  Terrence J. Sejnowski,et al.  Tempering Backpropagation Networks: Not All Weights are Created Equal , 1995, NIPS.

[14]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Joshua B. Tenenbaum,et al.  Separating Style and Content , 1996, NIPS.