Fast Stochastic Methods for Nonsmooth Nonconvex Optimization

We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tackle this issue, we develop fast stochastic algorithms that provably converge to a stationary point for constant minibatches. Furthermore, using a variant of these algorithms, we show provably faster convergence than batch proximal gradient descent. Finally, we prove global linear convergence rate for an interesting subclass of nonsmooth nonconvex functions, that subsumes several recent works. This paper builds upon our recent series of papers on fast stochastic methods for smooth nonconvex optimization [22, 23], with a novel analysis for nonconvex and nonsmooth functions.

[1]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[2]  Boris Polyak Gradient methods for the minimisation of functionals , 1963 .

[3]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[4]  M. Fukushima,et al.  A minimization method for the sum of a convex function and a continuously differentiable function , 1981 .

[5]  M. Fukushima,et al.  A generalized proximal point algorithm for certain non-convex minimization problems , 1981 .

[6]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[7]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[8]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[9]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[10]  H. Robbins A Stochastic Approximation Method , 1951 .

[11]  Julien Mairal,et al.  Convex optimization with sparsity-inducing norms , 2011 .

[12]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[13]  Suvrit Sra,et al.  Scalable nonconvex inexact proximal splitting , 2012, NIPS.

[14]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[15]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[16]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[17]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[18]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[19]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[20]  Mark W. Schmidt,et al.  Linear Convergence of Proximal-Gradient Methods under the Polyak-Łojasiewicz Condition , 2015 .

[21]  Ohad Shamir,et al.  A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate , 2014, ICML.

[22]  Léon Bottou,et al.  A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[23]  Stephen J. Wright,et al.  Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[24]  Alexander J. Smola,et al.  On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.

[25]  Ohad Shamir,et al.  Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity , 2015, ICML.

[26]  Alexander J. Smola,et al.  Fast Incremental Method for Nonconvex Optimization , 2016, ArXiv.

[27]  Zeyuan Allen Zhu,et al.  Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[28]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[29]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[30]  Jarvis D. Haupt,et al.  Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction , 2016 .

[31]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[32]  Tuo Zhao,et al.  Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning , 2016, ICML.

[33]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[34]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.