Composite objective mirror descent

We present a new method for regularized convex optimization and analyze it under both online and stochastic optimization settings. In addition to unifying previously known firstorder algorithms, such as the projected gradient method, mirror descent, and forwardbackward splitting, our method yields new analysis and algorithms. We also derive specific instantiations of our method for commonly used regularization functions, such as l1, mixed norm, and trace-norm.

[1]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[2]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[3]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[4]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[5]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[6]  K. Ball,et al.  Sharp uniform convexity and smoothness inequalities for trace norms , 1994 .

[7]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[8]  A. Lewis The Convex Analysis of Unitarily Invariant Matrix Functions , 1995 .

[9]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[10]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[11]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[12]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[13]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[14]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[15]  Adam Tauman Kalai,et al.  Logarithmic Regret Algorithms for Online Convex Optimization , 2006, COLT.

[16]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[17]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[18]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[19]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[20]  Ambuj Tewari,et al.  On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[21]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[22]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[23]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[24]  D. Goldfarb,et al.  Fixed point and Bregman iterative methods for matrix rank , 2009 .

[25]  Stephen J. Wright,et al.  Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[26]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[27]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[28]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[29]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[30]  Paul Tseng,et al.  Approximation accuracy, gradient methods, and error bound for structured convex optimization , 2010, Math. Program..

[31]  Shiqian Ma,et al.  Fixed point and Bregman iterative methods for matrix rank minimization , 2009, Math. Program..