Supervised Learning from Microarray Data

Gene expression arrays pose challenging problems for most traditional supervised learning techniques. We present a discussion of some of the issues involved. We then propose a simple approach to class prediction for DNA microarrays, based on a enhancement of the nearest centroid classifier. Our technique uses soft-thresholded class centroids as prototypes for each class. The shrinkage improves significantly prediction performance, and identifies a subset of the genes most responsible for class separation. The method performs as well or better than competitors from the literature, and is easy to understand and interpret. We illustrate the technique on data from three studies: small round blue cell tumors, leukemia and breast cancer.

[1]  J. Friedman Regularized Discriminant Analysis , 1989 .

[2]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[7]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.