SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

The SGD-QN algorithm is a stochastic gradient descent algorithm that makes careful use of second-order information and splits the parameter update into independently scheduled components. Thanks to this design, SGD-QN iterates nearly as fast as a first-order stochastic gradient descent but requires less iterations to achieve the same accuracy. This algorithm won the "Wild Track" of the first PASCAL Large Scale Learning Challenge (Sonnenburg et al., 2008).

[1]  V. Fabian Asymptotically Efficient Stochastic Approximation; The RM Case , 1973 .

[2]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[3]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[4]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[5]  Xavier Driancourt Optimisation par descente de gradient stochastique de systemes modulaires combinant reseaux de neurones et programmation dynamique. Application a la reconnaissance de la parole , 1994 .

[6]  Shun-ichi Amari,et al.  Statistical analysis of learning dynamics , 1999, Signal Process..

[7]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[8]  Kenji Fukumizu,et al.  Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[9]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[10]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[11]  Léon Bottou,et al.  On-line learning for very large data sets , 2005 .

[12]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[13]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[14]  Simon Günter,et al.  A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[15]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[16]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..