Forward stagewise regression and the monotone lasso

We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron, Hastie, Johnstone & Tibshirani (2004) it is proved that the least angle regression algorithm, with a small modification, solves the lasso regression problem. Here we give an analogous result for incremental forward stagewise regression, showing that it solves a version of the lasso problem that enforces monotonicity. One consequence of this is as follows: while lasso makes optimal progress in terms of reducing the residual sum-of-squares per unit increase in $L_1$-norm of the coefficient $\beta$, forward stage-wise is optimal per unit $L_1$ arc-length traveled along the coefficient path. We also study a condition under which the coefficient paths of the lasso are monotone, and hence the different algorithms coincide. Finally, we compare the lasso and forward stagewise procedures in a simulation study involving a large number of correlated predictors.

[1]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[2]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[3]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[4]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[5]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[6]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[7]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[8]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[9]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[10]  P. Zhao Boosted Lasso , 2004 .

[11]  Bogdan E. Popescu,et al.  Gradient Directed Regularization , 2004 .

[12]  Saharon Rosset,et al.  Tracking Curved Regularized Optimization Solution Paths , 2004, NIPS 2004.

[13]  Mee Young Park,et al.  L 1-regularization path algorithm for generalized linear models , 2006 .

[14]  D. Hinkley Annals of Statistics , 2006 .

[15]  P. Bühlmann Boosting for high-dimensional linear models , 2006 .

[16]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[17]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[18]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[19]  Yaakov Tsaig,et al.  Fast Solution of $\ell _{1}$ -Norm Minimization Problems When the Solution May Be Sparse , 2008, IEEE Transactions on Information Theory.

[20]  D. Donoho,et al.  Fast Solution of -Norm Minimization Problems When the Solution May Be Sparse , 2008 .