Stochastic Frank-Wolfe methods for nonconvex optimization

We study Frank-Wolfe methods for nonconvex stochastic and finite-sum optimization problems. Frank-Wolfe methods (in the convex case) have gained tremendous recent interest in machine learning and optimization due to their projection-free property and their ability to exploit structured constraints. However, our understanding of these algorithms in the nonconvex setting is fairly limited. In this paper, we propose nonconvex stochastic Frank-Wolfe methods and analyze their convergence properties. Furthermore, for objective functions that decompose into a finite-sum, we leverage ideas from variance reduction for convex optimization to obtain new variance reduced nonconvex Frank-Wolfe methods that have provably faster convergence than the classical Frank-Wolfe method.

[1]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[2]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[3]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[4]  H. Robbins A Stochastic Approximation Method , 1951 .

[5]  Peter L. Bartlett,et al.  Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..

[6]  S. Fujishige,et al.  A Submodular Function Minimization Algorithm Based on the Minimum-Norm Base ⁄ , 2009 .

[7]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[8]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[9]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[10]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[11]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[12]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[13]  Léon Bottou,et al.  A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[14]  Elad Hazan,et al.  Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[15]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[16]  Zaïd Harchaoui,et al.  Conditional gradient algorithms for norm-regularized smooth convex optimization , 2013, Math. Program..

[17]  Alexander J. Smola,et al.  Fast Stochastic Methods for Nonsmooth Nonconvex Optimization , 2016, ArXiv.

[18]  Alexander J. Smola,et al.  Fast Incremental Method for Nonconvex Optimization , 2016, ArXiv.

[19]  Alexander J. Smola,et al.  Fast incremental method for smooth nonconvex optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[20]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[21]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[22]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[23]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[24]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[25]  Yaoliang Yu,et al.  Generalized Conditional Gradient for Sparse Estimation , 2014, J. Mach. Learn. Res..

[26]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .