Centering Neural Network Gradient Factors

It has long been known that neural networks can learn faster when their input and hidden unit activity is centered about zero; recently we have extended this approach to also encompass the centering of error signals (Schraudolph & Sejnowski, 1996). Here we generalize this notion to all factors involved in the network''s gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning significantly without adversely affecting the trained network''s generalization ability.

[1]  A. E. Vries,et al.  Separation of 14C16O and 12C18O by thermal diffusion , 1956 .

[2]  B. Widrow,et al.  Stationary and nonstationary learning characteristics of the LMS adaptive filter , 1976, Proceedings of the IEEE.

[3]  T. Sejnowski,et al.  Storing covariance with nonlinearly interacting neurons , 1977, Journal of mathematical biology.

[4]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[5]  Alan S. Lapedes,et al.  A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition , 1986 .

[6]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[7]  Roberto Battiti,et al.  Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[8]  Kanter,et al.  Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.

[9]  Nathan Intrator,et al.  Feature Extraction Using an Unsupervised Neural Network , 1992, Neural Computation.

[10]  Terrence J. Sejnowski,et al.  Unsupervised Discrimination of Clustered Data via Optimization of Binary Information Gain , 1992, NIPS.

[11]  Francesco Palmieri,et al.  Optimal filtering algorithms for fast learning in feedforward neural networks , 1992, Neural Networks.

[12]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[13]  Peter D. Turney Exploiting Context When Learning to Classify , 1993, ECML.

[14]  Pavel B. Brazdil,et al.  Machine Learning: ECML-93 , 1993, Lecture Notes in Computer Science.

[15]  Michael Finke,et al.  Estimating A-Posteriori Probabilities using Stochastic Network Models , 1993 .

[16]  Terrence J. Sejnowski,et al.  Tempering Backpropagation Networks: Not All Weights are Created Equal , 1995, NIPS.

[17]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Gerd Hirzinger,et al.  Solving the Ill-Conditioning in Neural Network Learning , 1996, Neural Networks: Tricks of the Trade.

[19]  Ralph Neuneier,et al.  How to Train Neural Networks , 1996, Neural Networks: Tricks of the Trade.

[20]  Joshua B. Tenenbaum,et al.  Separating Style and Content , 1996, NIPS.

[21]  N. Schraudolph Slope Centering: Making Shortcut Weights Effective , 1998 .

[22]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[23]  Jürgen Schmidhuber,et al.  Feature Extraction Through LOCOCODE , 1999, Neural Computation.

[24]  Peter D. Turney Robust Classification with Context-Sensitive Features , 2002, ArXiv.

[25]  A. K. Rigler,et al.  Accelerating the convergence of the back-propagation method , 1988, Biological Cybernetics.

[26]  Lutz Prechelt,et al.  Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[27]  Gerd Hirzinger,et al.  Solving the Ill-Conditioning in Neural Network Learning , 2012, Neural Networks: Tricks of the Trade.

[28]  Gary William Flake,et al.  Square Unit Augmented, Radially Extended, Multilayer Perceptrons , 1996, Neural Networks: Tricks of the Trade.

[29]  David S. Touretzky,et al.  Proceedings of the 1993 Connectionist Models Summer School , 2014 .