A SIGNIFICANCE TEST FOR THE LASSO.

In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a [Formula: see text] distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than [Formula: see text] under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the [Formula: see text] penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties-adaptivity and shrinkage-and its null distribution is tractable and asymptotically Exp(1).

[1]  Ishay Weissman,et al.  Estimation of parameters and large quantiles based on the K largest observations , 1978, Advances in Applied Probability.

[2]  I. Weissman Estimation of Parameters and Large Quantiles Based on the k Largest Observations , 1978 .

[3]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[4]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[7]  Bryan Chan,et al.  Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database , 1999, Nucleic Acids Res..

[8]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[9]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  Akimichi Takemura,et al.  MATHEMATICAL ENGINEERING TECHNICAL REPORTS Validity of the expected Euler characteristic heuristic , 2003 .

[12]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[13]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[14]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[15]  Jean-Jacques Fuchs,et al.  Recovery of exact sparse representations in the presence of bounded noise , 2005, IEEE Transactions on Information Theory.

[16]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[17]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[18]  Mee Young Park,et al.  L 1-regularization path algorithm for generalized linear models , 2006 .

[19]  L. Haan,et al.  Extreme value theory : an introduction , 2006 .

[20]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[21]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[22]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[23]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[24]  S. Pandey,et al.  What Are Degrees of Freedom , 2008 .

[25]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[26]  Peter Bühlmann,et al.  p-Values for High-Dimensional Regression , 2008, 0811.2177.

[27]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[28]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[29]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[30]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[31]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[32]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[33]  Bradley Efron,et al.  False Discovery Rate Control , 2010 .

[34]  Richard A. Berk,et al.  Statistical Inference After Model Selection , 2010 .

[35]  Lu Tian,et al.  A Perturbation Method for Inference on Regularized Regression Estimates , 2011, Journal of the American Statistical Association.

[36]  Emmanuel J. Candès,et al.  Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[37]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[38]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[39]  Emmanuel J. Candès,et al.  NESTA: A Fast and Accurate First-Order Method for Sparse Recovery , 2009, SIAM J. Imaging Sci..

[40]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[41]  Cun-Hui Zhang,et al.  Confidence Intervals for Low-Dimensional Parameters With High-Dimensional Data , 2011 .

[42]  Susan A Murphy,et al.  Adaptive Confidence Intervals for the Test Error in Classification , 2011, Journal of the American Statistical Association.

[43]  Jianqing Fan,et al.  Variance estimation using refitted cross‐validation in ultrahigh dimensional regression , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[44]  Peter Buhlmann Statistical significance in high-dimensional linear models , 2012, 1202.1377.

[45]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.

[46]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[47]  R. Tibshirani,et al.  Adaptive testing for the graphical lasso , 2013, 1307.4765.

[48]  Joshua R. Loftus,et al.  Inference in adaptive regression via the Kac–Rice formula , 2013, 1308.3020.

[49]  Alexandra Chouldechova,et al.  False Discovery Rate Control for Sequential Selection Procedures, with Application to the Lasso , 2013 .

[50]  R. Tibshirani,et al.  Sequential selection procedures and false discovery rate control , 2013, 1309.5352.

[51]  D. L. Donoho,et al.  Compressed sensing , 2006, IEEE Trans. Inf. Theory.

[52]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[53]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[54]  Adel Javanmard,et al.  Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory , 2013, IEEE Transactions on Information Theory.