论文信息 - Accelerated Gradient Descent by Factor-Centering Decomposition

Accelerated Gradient Descent by Factor-Centering Decomposition

Gradient factor centering is a new methodology for decomposing neural networks into biased and centered subnets which are then trained in parallel. The decomposition can be applied to any pattern-dependent factor in the network''s gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simplified architecture, centered subnets due to a modified gradient that improves conditioning. The architectural and algorithmic modifications mandated by this approach include both familiar and novel elements, often in prescribed combinations. The framework suggests for instance that shortcut connections - a well-known architectural feature - should work best in conjunction with slope centering, a new technique described herein. Our benchmark experiments bear out this prediction, and show that factor-centering decomposition can speed up learning significantly without adversely affecting the trained network''s generalization ability.

Nicol N. Schraudolph | N. Schraudolph

[1] B. Widrow,et al. Stationary and nonstationary learning characteristics of the LMS adaptive filter , 1976, Proceedings of the IEEE.

[2] Alan S. Lapedes,et al. A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition , 1986 .

[3] 李幼升,et al. Ph , 1989 .

[4] Kanter,et al. Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.

[5] Roberto Battiti,et al. First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[6] Terrence J. Sejnowski,et al. Tempering Backpropagation Networks: Not All Weights are Created Equal , 1995, NIPS.

[7] Ralph Neuneier,et al. How to Train Neural Networks , 1996, Neural Networks: Tricks of the Trade.

[8] Jürgen Schmidhuber,et al. Unsupervised Coding with LOCOCODE , 1997, ICANN.

[9] N. Schraudolph. Slope Centering: Making Shortcut Weights Effective , 1998 .

[10] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.