Covariance‐regularized regression and classification for high dimensional problems

Summary.  We propose covariance‐regularized regression, a family of methods for prediction in high dimensional settings that uses a shrunken estimate of the inverse covariance matrix of the features to achieve superior prediction. An estimate of the inverse covariance matrix is obtained by maximizing the log‐likelihood of the data, under a multivariate normal model, subject to a penalty; it is then used to estimate coefficients for the regression of the response onto the features. We show that ridge regression, the lasso and the elastic net are special cases of covariance‐regularized regression, and we demonstrate that certain previously unexplored forms of covariance‐regularized regression can outperform existing methods in a range of situations. The covariance‐regularized regression framework is extended to generalized linear models and linear discriminant analysis, and is used to analyse gene expression data sets with multiple class and survival outcomes.

[1]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[2]  Terence J. O'Neill Normal Discrimination with Unclassified Observations , 1978 .

[3]  L. R. Haff ESTIMATION OF THE INVERSE COVARIANCE MATRIX: RANDOM MIXTURES OF THE INVERSE WISHART MATRIX AND THE IDENTITY , 1979 .

[4]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[5]  Walter R. Young,et al.  The Statistical Analysis of Failure Time Data , 1981 .

[6]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[7]  D. Dey,et al.  Estimation of a covariance matrix under Stein's loss , 1985 .

[8]  J. Friedman Regularized Discriminant Analysis , 1989 .

[9]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[10]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[11]  T. Hastie,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Discussion , 1993 .

[12]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[15]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[18]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[19]  Meland,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[20]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Laurence L. George,et al.  The Statistical Analysis of Failure Time Data , 2003, Technometrics.

[22]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[23]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[26]  T. Golub,et al.  Molecular profiling of diffuse large B-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. , 2004, Blood.

[27]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[28]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[29]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[30]  Mee Young Park,et al.  L 1-regularization path algorithm for generalized linear models , 2006 .

[31]  R. Spang,et al.  A biologic definition of Burkitt's lymphoma from transcriptional and genomic profiling. , 2006, The New England journal of medicine.

[32]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[33]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[34]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[35]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[36]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[37]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Maximum Likelihood Estimation , 2007, ArXiv.

[38]  Mike West,et al.  The Use of Unlabeled Data in Predictive Modeling , 2007, 0710.4618.

[39]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[40]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[41]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[42]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[43]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.