Statistical Applications in Genetics and Molecular Biology

We propose a method for prediction in Cox's proportional model, when the number of features (regressors), p, exceeds the number of observations, n. The method assumes that the features are independent in each risk set, so that the partial likelihood factors into a product. As such, it is analogous to univariate thresholding in linear regression and nearest shrunken centroids in classification. We call the procedure Cox univariate shrinkage and demonstrate its usefulness on real and simulated data. The method has the attractive property of being essentially univariate in its operation: the features are entered into the model based on the size of their Cox score statistics. We illustrate the new method on real and simulated data, and compare it to other proposed methods for survival prediction with a large number of predictors.

[1]  R. Tibshirani,et al.  "Preconditioning" for feature selection and regression in high-dimensional problems , 2007, math/0703858.

[2]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[4]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Meland,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[6]  P. Heagerty,et al.  Survival Model Predictive Accuracy and ROC Curves , 2005, Biometrics.

[7]  Robert Tibshirani,et al.  Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma , 2005, PLoS medicine.

[8]  Albert D. Shieh,et al.  Statistical Applications in Genetics and Molecular Biology , 2010 .

[9]  P. J. Verweij,et al.  Cross-validation in survival analysis. , 1993, Statistics in medicine.

[10]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[13]  Sounak Chakraborty,et al.  A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data , 2011, Comput. Stat. Data Anal..

[14]  B. Efron Empirical Bayes Estimates for Large-Scale Prediction Problems , 2009, Journal of the American Statistical Association.

[15]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .