Statistical Applications in Genetics and Molecular Biology

In microarray studies, an important problem is to compare a predictor of disease outcome derived from gene expression levels to standard clinical predictors. Comparing them on the same dataset that was used to derive the microarray predictor can lead to results strongly biased in favor of the microarray predictor. We propose a new technique called “pre-validation” for making a fairer comparison between the two sets of predictors. We study the method analytically and explore its application in a recent study on breast cancer.

[1]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[3]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[4]  R. Tibshirani,et al.  Pre-validation and inference in microarrays , 2002, Statistical applications in genetics and molecular biology.

[5]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[6]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[7]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[8]  B. Efron Jackknife‐After‐Bootstrap Standard Errors and Influence Functions , 1992 .

[9]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[10]  S. T. Buckland,et al.  An Introduction to the Bootstrap , 1994 .

[11]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[12]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[13]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[14]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[16]  M. Hill,et al.  NONLINEAR MULTIVARIATE ANALYSIS , 1990 .

[17]  W. Massy Principal Components Regression in Exploratory Statistical Research , 1965 .

[18]  M. Ringnér,et al.  Impact of DNA amplification on gene expression patterns in breast cancer. , 2002, Cancer research.

[19]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .