论文信息 - A Generic Approach for Escaping Saddle points - 字舞流文

A Generic Approach for Escaping Saddle points

A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them impractical in large-scale settings. To tackle this challenge, we introduce a generic framework that minimizes Hessian based computations while at the same time provably converging to second-order critical points. Our framework carefully alternates between a first-order and a second-order subroutine, using the latter only close to saddle points, and yields convergence results competitive to the state-of-the-art. Empirical results suggest that our strategy also enjoys a good practical performance.

Alexander J. Smola | Barnabás Póczos | Suvrit Sra | Ruslan Salakhutdinov | Francis R. Bach | Manzil Zaheer | Sashank J. Reddi | R. Salakhutdinov | M. Zaheer | F. Bach | Alex Smola | B. Póczos | S. Sra

[1] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[2] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .

[3] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[4] Tamer Basar,et al. Analysis of Recursive Stochastic Algorithms , 2001 .

[5] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[6] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[7] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[8] H. Robbins. A Stochastic Approximation Method , 1951 .

[9] Geoffrey E. Hinton. Reducing the Dimensionality of Data with Neural , 2008 .

[10] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[11] Daniel Povey,et al. Krylov Subspace Descent for Deep Learning , 2011, AISTATS.

[12] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[13] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[14] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[15] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[16] Justin Domke,et al. Finito: A faster, permutable incremental gradient method for big data problems , 2014, ICML.

[17] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[18] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.

[19] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[20] Yoshua Bengio,et al. Equilibrated adaptive learning rates for non-convex optimization , 2015, NIPS.

[21] Léon Bottou,et al. A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[22] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[24] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.

[25] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[26] Alexander J. Smola,et al. On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.

[27] Jie Liu,et al. Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting , 2014, IEEE Journal of Selected Topics in Signal Processing.

[28] Alexander J. Smola,et al. Fast Stochastic Methods for Nonsmooth Nonconvex Optimization , 2016, ArXiv.

[29] Alexander J. Smola,et al. Fast Incremental Method for Nonconvex Optimization , 2016, ArXiv.

[30] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..

[31] Kfir Y. Levy,et al. The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.

[32] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[33] Tengyu Ma,et al. Finding Approximate Local Minima for Nonconvex Optimization in Linear Time , 2016, ArXiv.

[34] Naman Agarwal,et al. Second Order Stochastic Optimization in Linear Time , 2016, ArXiv.

[35] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[36] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.

[37] Yi Zhou,et al. An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[38] Katya Scheinberg,et al. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models , 2015, Mathematical Programming.