A Selective Approach to Internal Inference

A common goal in modern biostatistics is to form a biomarker signature from high dimensional gene expression data that is predictive of some outcome of interest. After learning this biomarker signature, an important question to answer is how well it predicts the response compared to classical predictors. This is challenging, because the biomarker signature is an internal predictor -- one that has been learned using the same dataset on which we want to evaluate it's significance. We propose a new method for approaching this problem based on the technique of selective inference. Simulations show that our method is able to properly control the level of the test, and that in certain settings we have more power than sample splitting.

[1]  Jonathan Taylor,et al.  Asymptotics of Selective Inference , 2015, 1501.03588.

[2]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[3]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[4]  C. Scholz,et al.  Expression of microRNA‐221 is progressively reduced in aggressive prostate cancer and metastasis and predicts clinical recurrence , 2009, International journal of cancer.

[5]  Peter Bühlmann,et al.  High-Dimensional Statistics with a View Toward Applications in Biology , 2014 .

[6]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[7]  Dennis L. Sun,et al.  Optimal Inference After Model Selection , 2014, 1410.2597.

[8]  Robert Tibshirani,et al.  A study of pre-validation , 2008, 0807.4105.

[9]  Joshua R. Loftus,et al.  A significance test for forward stepwise model selection , 2014, 1405.3920.

[10]  Shirin Golchi,et al.  Sequentially Constrained Monte Carlo , 2014, Comput. Stat. Data Anal..

[11]  Dennis L. Sun,et al.  Exact post-selection inference with the lasso , 2013 .

[12]  Jonathan E. Taylor,et al.  Exact Post Model Selection Inference for Marginal Screening , 2014, NIPS.

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Robert Tibshirani,et al.  Post-selection adaptive inference for Least Angle Regression and the Lasso , 2014 .

[15]  D. Cox A note on data-splitting for the evaluation of significance levels , 1975 .

[16]  R. Tibshirani,et al.  Pre-validation and inference in microarrays , 2002, Statistical applications in genetics and molecular biology.

[17]  Jan Zakrzewski,et al.  Melanoma MicroRNA Signature Predicts Post-Recurrence Survival , 2010, Clinical Cancer Research.