On the “degrees of freedom” of the lasso

We study the effective degrees of freedom of the lasso in the framework of Stein’s unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate forthe degrees of freedom of the lasso—a conclusion that requires no special assumption on the predictors. In addition, the unbiased estimator is shown to be asymptotically consistent. With these results on hand, various model selection criteria—Cp, AIC and BIC—are available, which, along with the LARS algorithm, provide a principled and efficient approach to obtaining the optimal lasso fit with the computational effort of a single ordinary least-squares fit.

[1]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2]  C. L. Mallows Some comments on C_p , 1973 .

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[5]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[6]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[7]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[8]  C. Mallows More comments on C p , 1995 .

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[11]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[12]  Mary C. Meyer,et al.  ON THE DEGREES OF FREEDOM IN SHAPE-RESTRICTED REGRESSION , 2000 .

[13]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[14]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[15]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[16]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[17]  Xiaotong Shen,et al.  Adaptive Model Selection , 2002 .

[18]  Yuhong Yang Can the Strengths of AIC and BIC Be Shared , 2005 .

[19]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[20]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[21]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[22]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[23]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[24]  Xiaotong Shen,et al.  Adaptive model selection and assessment for exponential family models , 2004 .

[25]  Xiaotong Shen,et al.  Adaptive Model Selection and Assessment for Exponential Family Distributions , 2004, Technometrics.

[26]  B. Efron The Estimation of Prediction Error , 2004 .

[27]  Peter Bühlmann,et al.  Boosting, model selection, lasso and nonnegative garrote , 2005 .

[28]  Runze Li,et al.  Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[29]  Xiaotong Shen,et al.  Optimal Model Assessment, Selection, and Combination , 2006 .