Efficient Online and Batch Learning Using Forward Backward Splitting

We describe, analyze, and experiment with a framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term while keeping close proximity to the result of the first phase. This view yields a simple yet effective algorithm that can be used for batch penalized risk minimization and online learning. Furthermore, the two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as l1. We derive concrete and very simple algorithms for minimization of loss functions with l1, l2, l22, and l∞ regularization. We also show how to construct efficient algorithms for mixed-norm l1/lq regularization. We further extend the algorithms and give efficient implementations for very high-dimensional data with sparsity. We demonstrate the potential of the proposed framework in a series of experiments with synthetic and natural data sets.

[1]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[2]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[3]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[4]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[5]  R. Tyrrell Rockafellar,et al.  Convergence Rates in Forward-Backward Splitting , 1997, SIAM J. Optim..

[6]  Paul Tseng,et al.  A Modified Forward-backward Splitting Method for Maximal Monotone Mappings 1 , 1998 .

[7]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[8]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[9]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[10]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[11]  A. Owen A robust hybrid of lasso and ridge regression , 2006 .

[12]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[13]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[14]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[15]  Wotao Yin,et al.  TR 0707 A Fixed-Point Continuation Method for ` 1-Regularized Minimization with Applications to Compressed Sensing , 2007 .

[16]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[17]  Y. Singer,et al.  Logarithmic Regret Algorithms for Strongly Convex Repeated Games , 2007 .

[18]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[19]  David Grangier,et al.  A Discriminative Kernel-based Model to Rank Images from Text Queries , 2007 .

[20]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[21]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[22]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[23]  G. Obozinski Joint covariate selection for grouped classification , 2007 .

[24]  G. Obozinski,et al.  High-dimensional union support recovery in multivariate regression , 2008 .

[25]  Martin J. Wainwright,et al.  Phase transitions for high-dimensional joint support recovery , 2008, NIPS.

[26]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  I. Daubechies,et al.  Accelerated Projected Gradient Method for Linear Inverse Problems with Sparsity Constraints , 2007, 0706.4297.

[28]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[29]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[30]  Michael I. Jordan,et al.  High-dimensional union support recovery in multivariate regression , 2008, NIPS 2008.

[31]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[32]  Mark W. Schmidt,et al.  Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.

[33]  Trevor Darrell,et al.  An efficient projection for l1, ∞ regularization , 2009, ICML '09.

[34]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[35]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[36]  Trevor Darrell,et al.  An efficient projection for l 1 , infinity regularization. , 2009, ICML 2009.

[37]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[38]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[39]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .