论文信息 - A Linearly-Convergent Stochastic L-BFGS Algorithm - 字舞流文

A Linearly-Convergent Stochastic L-BFGS Algorithm

We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and Zhang (2013). We demonstrate experimentally that our algorithm performs well on large-scale convex and non-convex optimization problems, exhibiting linear convergence and rapidly solving the optimization problems to high levels of precision. Furthermore, we show that our algorithm performs well for a wide-range of step sizes, often differing by several orders of magnitude.

Michael I. Jordan | Robert Nishihara | Philipp Moritz | Philipp Moritz | Robert Nishihara

[1] J. J. Moré,et al. Quasi-Newton Methods, Motivation and Theory , 1974 .

[2] R. Dembo,et al. INEXACT NEWTON METHODS , 1982 .

[3] Trond Steihaug,et al. Truncated-newtono algorithms for large-scale unconstrained optimization , 1983, Math. Program..

[4] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[5] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[6] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.

[7] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.

[8] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[9] H. Robbins. A Stochastic Approximation Method , 1951 .

[10] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[11] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[12] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[13] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[14] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[15] Thierry Bertin-Mahieux,et al. The Million Song Dataset , 2011, ISMIR.

[16] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[17] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[18] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[19] Xi Chen,et al. Variance Reduction for Stochastic Gradient Optimization , 2013, NIPS.

[20] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[21] Christopher Ré,et al. Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Math. Program. Comput..

[22] John Langford,et al. A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[23] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[24] Surya Ganguli,et al. Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods , 2013, ICML.

[25] Aryan Mokhtari,et al. RES: Regularized Stochastic BFGS Algorithm , 2014, IEEE Transactions on Signal Processing.

[26] Aryan Mokhtari,et al. Global convergence of online limited memory BFGS , 2014, J. Mach. Learn. Res..

[27] Thomas Hofmann,et al. A Variance Reduced Stochastic Newton Method , 2015, ArXiv.

[28] Sham M. Kakade,et al. Competing with the Empirical Risk Minimizer in a Single Pass , 2014, COLT.

[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30] Jorge Nocedal,et al. A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..

[31] Shiqian Ma,et al. Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization , 2014, SIAM J. Optim..