An Optimal Algorithm for Strongly Convex Minimization under Affine Constraints

Optimization problems under affine constraints appear in various areas of machine learning. We consider the task of minimizing a smooth strongly convex function F (x) under the affine constraint Kx = b, with an oracle providing evaluations of the gradient of F and multiplications by K and its transpose. We provide lower bounds on the number of gradient computations and matrix multiplications to achieve a given accuracy. Then we propose an accelerated primal–dual algorithm achieving these lower bounds. Our algorithm is the first optimal algorithm for this class of problems.

[1]  Antonin Chambolle,et al.  An introduction to continuous optimization for imaging , 2016, Acta Numerica.

[2]  Haishan Ye,et al.  Multi-consensus Decentralized Accelerated Gradient Descent , 2020, ArXiv.

[3]  A. Chambolle,et al.  On the Convergence of the Iterates of the “Fast Iterative Shrinkage/Thresholding Algorithm” , 2015, J. Optim. Theory Appl..

[4]  Laurent Condat,et al.  Distributed Proximal Splitting Algorithms with Rates and Acceleration , 2020, Frontiers in Signal Processing.

[5]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[6]  Heinz H. Bauschke,et al.  Fixed-Point Algorithms for Inverse Problems in Science and Engineering , 2011, Springer Optimization and Its Applications.

[7]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[8]  I. M. Otivation Playing with Duality: An Overview of Recent Primal-Dual Approaches for Solving Large-Scale Optimization Problems , 2018 .

[9]  Tom Goldstein,et al.  Operator Splitting Methods in Compressive Sensing and Sparse Approximation , 2016 .

[10]  I. Loris,et al.  On a generalization of the iterative soft-thresholding algorithm for the case of non-separable penalty , 2011, 1104.1087.

[11]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[12]  Bang Công Vu,et al.  A splitting algorithm for dual monotone inclusions involving cocoercive operators , 2011, Advances in Computational Mathematics.

[13]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[14]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[15]  James G. Scott,et al.  Proximal Algorithms in Statistics and Machine Learning , 2015, ArXiv.

[16]  Alejandro Ribeiro,et al.  Accelerated Dual Descent for Network Flow Optimization , 2014, IEEE Transactions on Automatic Control.

[17]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[18]  Alexander Gasnikov,et al.  Optimal Decentralized Distributed Algorithms for Stochastic Convex Optimization. , 2019 .

[19]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[20]  P. L. Combettes,et al.  Primal-Dual Splitting Algorithm for Solving Inclusions with Mixtures of Composite, Lipschitzian, and Parallel-Sum Type Monotone Operators , 2011, Set-Valued and Variational Analysis.

[21]  Ming Yan,et al.  A New Primal–Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator , 2016, J. Sci. Comput..

[22]  Wotao Yin,et al.  Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters , 2020, IEEE Transactions on Signal Processing.

[23]  A. Gasnikov,et al.  Decentralized and Parallelized Primal and Dual Accelerated Methods for Stochastic Convex Programming Problems , 2019, 1904.09015.

[24]  Louis A. Hageman,et al.  Iterative Solution of Large Linear Systems. , 1971 .

[25]  Martin H. Gutknecht,et al.  The Chebyshev iteration revisited , 2002, Parallel Comput..

[26]  Marc Teboulle,et al.  A simple algorithm for a class of nonsmooth convex-concave saddle-point problems , 2015, Oper. Res. Lett..

[27]  Dmitry Kovalev,et al.  Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization , 2020, NeurIPS.

[28]  Laurent Condat,et al.  Proximal Splitting Algorithms: A Tour of Recent Advances, with New Twists. , 2020 .

[29]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[30]  Xiaoqun Zhang,et al.  A primal–dual fixed point algorithm for convex separable minimization with applications to image restoration , 2013 .

[31]  Optimal Accelerated Variance Reduced EXTRA and DIGing for Strongly Convex and Smooth Decentralized Optimization , 2020, ArXiv.

[32]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[33]  Colin N. Jones,et al.  Operator Splitting Methods in Control , 2016, Found. Trends Syst. Control..

[34]  Jennifer A. Scott,et al.  Chebyshev acceleration of iterative refinement , 2013, Numerical Algorithms.

[35]  Ernö Robert Csetnek,et al.  Recent Developments on Primal–Dual Splitting Methods with Applications to Convex Minimization , 2014 .

[36]  Wotao Yin,et al.  Splitting Methods in Communication, Imaging, Science, and Engineering , 2017 .

[37]  Laurent Condat,et al.  A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms , 2013, J. Optim. Theory Appl..

[38]  Peter Richtárik,et al.  A Stochastic Decoupling Method for Minimizing the Sum of Smooth and Non-Smooth Functions , 2019, 1905.11535.

[39]  Patrick Pérez,et al.  Sketching for Large-Scale Learning of Mixture Models. (Apprentissage de modèles de mélange à large échelle par Sketching) , 2017 .

[40]  Laurent Condat,et al.  Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms , 2020, ArXiv.

[41]  Stefanie Jegelka,et al.  IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method , 2020, NeurIPS.

[42]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..