Exact Post-Selection Inference for Sequential Regression Procedures

ABSTRACT We propose new inference tools for forward stepwise regression, least angle regression, and the lasso. Assuming a Gaussian model for the observation vector y, we first describe a general scheme to perform valid inference after any selection event that can be characterized as y falling into a polyhedral set. This framework allows us to derive conditional (post-selection) hypothesis tests at any step of forward stepwise or least angle regression, or any step along the lasso regularization path, because, as it turns out, selection events for these procedures can be expressed as polyhedral constraints on y. The p-values associated with these tests are exactly uniform under the null distribution, in finite samples, yielding exact Type I error control. The tests can also be inverted to produce confidence intervals for appropriate underlying regression parameters. The R package selectiveInference, freely available on the CRAN repository, implements the new inference tools described in this article. Supplementary materials for this article are available online.

[1]  R. Tibshirani,et al.  Selective Sequential Model Selection , 2015, 1512.02565.

[2]  Kory D. Johnson,et al.  Revisiting Alpha-Investing: Conditionally Valid Stepwise Regression , 2015 .

[3]  R. Tibshirani,et al.  Selecting the number of principal components: estimation of the true rank of a noisy matrix , 2014, 1410.8260.

[4]  Dennis L. Sun,et al.  Optimal Inference After Model Selection , 2014, 1410.2597.

[5]  T. Tony Cai,et al.  Discussion: "A significance test for the lasso" , 2014, 1405.6793.

[6]  Joshua R. Loftus,et al.  A significance test for forward stepwise model selection , 2014, 1405.3920.

[7]  Robert Tibshirani,et al.  Post‐selection point and interval estimation of signal sizes in Gaussian samples , 2014, 1405.3340.

[8]  Jonathan E. Taylor,et al.  Exact Post Model Selection Inference for Marginal Screening , 2014, NIPS.

[9]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[10]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[11]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[12]  Adel Javanmard,et al.  Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory , 2013, IEEE Transactions on Information Theory.

[13]  Dennis L. Sun,et al.  Exact post-selection inference with the lasso , 2013 .

[14]  Dennis L. Sun,et al.  Exact inference after model selection via the Lasso , 2013 .

[15]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[16]  Alexandra Chouldechova,et al.  False Discovery Rate Control for Sequential Selection Procedures, with Application to the Lasso , 2013 .

[17]  R. Tibshirani,et al.  Sequential selection procedures and false discovery rate control , 2013, 1309.5352.

[18]  Joshua R. Loftus,et al.  Inference in adaptive regression via the Kac–Rice formula , 2013, 1308.3020.

[19]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[20]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[21]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[22]  Peter Buhlmann Statistical significance in high-dimensional linear models , 2012, 1202.1377.

[23]  Lu Tian,et al.  A Perturbation Method for Inference on Regularized Regression Estimates , 2011, Journal of the American Statistical Association.

[24]  Cun-Hui Zhang,et al.  Confidence Intervals for Low-Dimensional Parameters With High-Dimensional Data , 2011 .

[25]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[26]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[27]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[28]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[29]  Elizaveta Levina,et al.  Discussion of "Stability selection" by N. Meinshausen and P. Buhlmann , 2010 .

[30]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[31]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[32]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[33]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .