Efficient quadratic regularization for expression arrays.

Gene expression arrays typically have 50 to 100 samples and 1000 to 20,000 variables (genes). There have been many attempts to adapt statistical models for regression and classification to these data, and in many cases these attempts have challenged the computational resources. In this article we expose a class of techniques based on quadratic regularization of linear models, including regularized (ridge) regression, logistic and multinomial regression, linear and mixture discriminant analysis, the Cox model and neural networks. For all of these models, we show that dramatic computational savings are possible over naive implementations, using standard transformations in numerical linear algebra.

[1]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[2]  Gene H. Golub,et al.  Matrix computations , 1983 .

[3]  J. Friedman Regularized Discriminant Analysis , 1989 .

[4]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[5]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[6]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  Bernhard Schölkopf,et al.  GACV for Support Vector Machines , 2000 .

[9]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[11]  Hansong Zhang,et al.  Gacv for support vector machines , 2000 .

[12]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[14]  Paul H. C. Eilers,et al.  Classification of microarray data with penalized logistic regression , 2001, SPIE BiOS.

[15]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[16]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[17]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  D. Ghosh Penalized Discriminant Methods for the Classification of Tumors from Gene Expression Data , 2003, Biometrics.

[19]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[20]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[21]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[22]  Ji Zhu,et al.  Margin Maximizing Loss Functions , 2003, NIPS.

[23]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[24]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[25]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[26]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[27]  Trevor Hastie,et al.  Regularized Discriminant Analysis and Its Application in Microarrays , 2004 .

[28]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.