Large-scale randomized-coordinate descent methods with non-separable linear constraints

We develop randomized (block) coordinate descent (CD) methods for linearly constrained convex optimization. Unlike most CD methods, we do not assume the constraints to be separable, but let them be coupled linearly. To our knowledge, ours is the first CD method that allows linear coupling constraints, without making the global iteration complexity have an exponential dependence on the number of constraints. We present algorithms and analysis for four key problem scenarios: (i) smooth; (ii) smooth + nonsmooth separable; (iii) asynchronous parallel; and (iv) stochastic. We illustrate empirical behavior of our algorithms by simulation experiments.

[1]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[2]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[3]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[4]  A. Auslender Optimisation : méthodes numériques , 1976 .

[5]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[6]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[7]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[8]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[9]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[10]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[11]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[12]  P. Tseng,et al.  Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization , 2009 .

[13]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[14]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[17]  Inderjit S. Dhillon,et al.  Fast coordinate descent methods with variable selection for non-negative matrix factorization , 2011, KDD.

[18]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[19]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[20]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[23]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[24]  Shai Shalev-Shwartz,et al.  Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.

[25]  Y. Nesterov,et al.  A RANDOM COORDINATE DESCENT METHOD ON LARGE-SCALE OPTIMIZATION PROBLEMS WITH LINEAR CONSTRAINTS , 2013 .

[26]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[27]  Ambuj Tewari,et al.  On the Nonasymptotic Convergence of Cyclic Coordinate Descent Methods , 2013, SIAM J. Optim..

[28]  Chih-Jen Lin,et al.  Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[29]  Ion Necoara,et al.  A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints , 2013, Comput. Optim. Appl..

[30]  Coordinate Descent Algorithms With Coupling Constraints : Lessons Learned , 2014 .

[31]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[32]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[33]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[34]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[35]  Peter Richtárik,et al.  Inexact Coordinate Descent: Complexity and Preconditioning , 2013, J. Optim. Theory Appl..

[36]  Peter Richtárik,et al.  Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..

[37]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[38]  Zhi-Quan Luo,et al.  Iteration complexity analysis of block coordinate descent methods , 2013, Math. Program..

[39]  Zhi-Quan Luo,et al.  On the linear convergence of the alternating direction method of multipliers , 2012, Mathematical Programming.

[40]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .