论文信息 - Towards Faster Stochastic Gradient Search

Towards Faster Stochastic Gradient Search

Stochastic gradient descent is a general algorithm which includes LMS, on-line backpropagation, and adaptive k-means clustering as special cases. The standard choices of the learning rate η (both adaptive and fixed functions of time) often perform quite poorly. In contrast, our recently proposed class of "search then converge" learning rate schedules (Darken and Moody, 1990) display the theoretically optimal asymptotic convergence rate and a superior ability to escape from poor local minima. However, the user is responsible for setting a key parameter. We propose here a new methodology for creating the first completely automatic adaptive learning rates which achieve the optimal rate of convergence.

John E. Moody | Christian J. Darken | J. Moody | C. Darken

[1] V. Fabian. On Asymptotic Normality in Stochastic Approximation , 1968 .

[2] P. Révész,et al. A limit theorem for the Robbins-Monro approximation , 1973 .

[3] H. Kushner. Rates of Convergence for Sequential Monte Carlo Optimization Methods , 1978 .

[4] Michel Installe,et al. Stochastic approximation methods , 1978 .

[5] S. Urasiev,et al. Adaptive Stochastic Quasigradient Procedures , 1988 .

[6] Robert A. Jacobs,et al. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[7] John E. Moody,et al. Note on Learning Rate Schedules for Stochastic Optimization , 1990, NIPS.

[8] Bernard Delyon,et al. Accelerated Stochastic Approximation , 1993, SIAM J. Optim..