Least angle regression

The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

[1]  Malik Beshir Malik,et al.  Applied Linear Regression , 2005, Technometrics.

[2]  I. Johnstone,et al.  Adapting to unknown sparsity by controlling the false discovery rate , 2005, math/0505374.

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[5]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[6]  J. S. Rao,et al.  Detecting Differentially Expressed Genes in Microarrays Using Bayesian Model Selection , 2003 .

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  P. Massart,et al.  Gaussian model selection , 2001 .

[9]  Dean Phillips Foster,et al.  Calibration and empirical Bayes variable selection , 2000 .

[10]  Mary C. Meyer,et al.  ON THE DEGREES OF FREEDOM IN SHAPE-RESTRICTED REGRESSION , 2000 .

[11]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[12]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[13]  C. Mallows Some Comments on Cp , 2000, Technometrics.

[14]  J. Brian Gray,et al.  Applied Regression Including Computing and Graphics , 1999, Technometrics.

[15]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[16]  Sanford Weisberg,et al.  Graphs in Statistical Analysis: Is the Medium the Message? , 1999 .

[17]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[18]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[19]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[20]  Dean Phillips Foster,et al.  An Information Theoretic Comparison of Model Selection Criteria , 1997 .

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[23]  C. Mallows More comments on C p , 1995 .

[24]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[25]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[26]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[27]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[28]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[29]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[30]  L. Breiman The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error , 1992 .

[31]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[32]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[33]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[34]  Sanford Weisberg,et al.  A Statistic for Allocating C p to Individual Cases , 1981 .

[35]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[36]  H. Akaike Maximum likelihood identification of Gaussian autoregressive moving average models , 1973 .

[37]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .