论文信息 - A study of pre-validation

A study of pre-validation

Given a predictor of outcome derived from a high-dimensional dataset, pre-validation is a useful technique for comparing it to competing predictors on the same dataset. For microarray data, it allows one to compare a newly derived predictor for disease outcome to standard clinical predictors on the same dataset. We study pre-validation analytically to determine if the inferences drawn from it are valid. We show that while pre-validation generally works well, the straightforward "one degree of freedom" analytical test from pre-validation can be biased and we propose a permutation test to remedy this problem. In simulation studies, we show that the permutation test has the nominal level and achieves roughly the same power as the analytical test.

Robert Tibshirani | Holger Hofling | R. Tibshirani | Holger Hofling

[1] Howard Y. Chang,et al. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2] B. Efron. How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[3] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[4] M. Pepe,et al. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. , 2004, American journal of epidemiology.

[5] Christophe Ambroise,et al. Selection bias in working with the top genes in supervised classification of tissue samples , 2006 .

[6] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[7] Mee Young Park,et al. L 1-regularization path algorithm for generalized linear models , 2006 .

[8] J. Ware. The limitations of risk factors as prognostic tools. , 2006, The New England journal of medicine.

[9] Mee Young Park,et al. L1‐regularization path algorithm for generalized linear models , 2007 .

[10] R. Tibshirani,et al. Pre-validation and inference in microarrays , 2002, Statistical applications in genetics and molecular biology.

[11] Jianming Ye. On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[12] T. Hastie,et al. Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[13] Yudong D. He,et al. Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[14] S. Dudoit,et al. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[15] R. Tibshirani,et al. An Introduction to the Bootstrap , 1995 .

[16] R. Tibshirani,et al. Generalized additive models for medical research , 1986, Statistical methods in medical research.

[17] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.