Penalized classification using Fisher's linear discriminant

Summary.  We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high dimensional setting where p≫n, LDA is not appropriate for two reasons. First, the standard estimate for the within‐class covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule that is obtained from LDA, since it involves all p features. We propose penalized LDA, which is a general approach for penalizing the discriminant vectors in Fisher's discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization–maximization approach to optimize it efficiently when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L1 and fused lasso penalties. Our proposal is equivalent to recasting Fisher's discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extending LDA to the high dimensional setting and explore their relationships with our proposal.

[1]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[2]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[3]  J. Friedman Regularized Discriminant Analysis , 1989 .

[4]  W. V. McCarthy,et al.  Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data , 1995 .

[5]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[8]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[10]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[12]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[13]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[14]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[15]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[16]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[17]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[18]  R. Steele,et al.  Optimization , 2005, Encyclopedia of Biometrics.

[19]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[20]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[21]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[22]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[23]  Jayant P. Menon,et al.  Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. , 2006, Cancer cell.

[24]  Jurjen Duintjer Tebbens,et al.  Improving implementation of linear discriminant analysis for the high dimension/small sample size problem , 2007, Comput. Stat. Data Anal..

[25]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[26]  Ian T. Jolliffe,et al.  DALASS: Variable selection in discriminant analysis via the LASSO , 2007, Comput. Stat. Data Anal..

[27]  Kathrin Klamroth,et al.  Biconvex sets and optimization with biconvex functions: a survey and extensions , 2007, Math. Methods Oper. Res..

[28]  Tsutomu Ohta,et al.  Gene expression analysis of soft tissue sarcomas: characterization and reclassification of malignant fibrous histiocytoma , 2007, Modern Pathology.

[29]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[30]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[31]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[32]  Chenlei Leng,et al.  Sparse optimal scoring for multiclass cancer diagnosis and biomarker detection using microarray data , 2008, Comput. Biol. Chem..

[33]  Brian Knutson,et al.  Interpretable Classifiers for fMRI Improve Prediction of Purchases , 2008, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[34]  Ping Xu,et al.  Modified linear discriminant analysis approaches for classification of high-dimensional microarray data , 2009, Comput. Stat. Data Anal..

[35]  Holger Hoefling A Path Algorithm for the Fused Lasso Signal Approximator , 2009, 0910.0526.

[36]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[37]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[38]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[39]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[40]  Nicholas A. Johnson,et al.  A Dynamic Programming Algorithm for the Fused Lasso and L 0-Segmentation , 2013 .