Efficient model selection for regularized linear discriminant analysis

Classical Linear Discriminant Analysis (LDA) is not applicable for small sample size problems due to the singularity of the scatter matrices involved. Regularized LDA (RLDA) provides a simple strategy to overcome the singularity problem by applying a regularization term, which is commonly estimated via cross-validation from a set of candidates. However, cross-validation may be computationally prohibitive when the candidate set is large. An efficient algorithm for RLDA is presented that computes the optimal transformation of RLDA for a large set of parameter candidates, with approximately the same cost as running RLDA a small number of times. Thus it facilitates efficient model selection for RLDA.An intrinsic relationship between RLDA and Uncorrelated LDA (ULDA), which was recently proposed for dimension reduction and classification is presented. More specifically, RLDA is shown to approach ULDA when the regularization value tends to zero. That is, RLDA without any regularization is equivalent to ULDA. It can be further shown that ULDA maps all data points from the same class to a common point, under a mild condition which has been shown to hold for many high-dimensional datasets. This leads to the overfitting problem in ULDA, which has been observed in several applications. Thetheoretical analysis presented provides further justification for the use of regularization in RLDA. Extensive experiments confirm the claimed theoretical estimate of efficiency. Experiments also show that, for a properly chosen regularization parameter, RLDA performs favorably in classification, in comparison with ULDA, as well as other existing LDA-based algorithms and Support Vector Machines (SVM).

[1]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[2]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[3]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Pong C. Yuen,et al.  Regularized discriminant analysis and its application to face recognition , 2003, Pattern Recognit..

[5]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[6]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[7]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[8]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[9]  Gene H. Golub,et al.  Matrix computations , 1983 .

[10]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[11]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[12]  Sayan Mukherjee,et al.  Molecular classification of multiple tumor types , 2001, ISMB.

[13]  Jing-Yu Yang,et al.  Face recognition based on the uncorrelated discriminant transformation , 2001, Pattern Recognit..

[14]  J. Friedman Regularized Discriminant Analysis , 1989 .

[15]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[16]  R. Tibshirani,et al.  Efficient quadratic regularization for expression arrays. , 2004, Biostatistics.

[17]  W. V. McCarthy,et al.  Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data , 1995 .

[18]  Jieping Ye,et al.  Regularized discriminant analysis for high dimensional, low sample size data , 2006, KDD '06.

[19]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Jieping Ye,et al.  Null space versus orthogonal linear discriminant analysis , 2006, ICML '06.

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Murat Dundar,et al.  Sparse Fisher Discriminant Analysis for Computer Aided Detection , 2005, SDM.

[24]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[25]  Trevor Hastie,et al.  Regularized Discriminant Analysis and Its Application in Microarrays , 2004 .

[26]  Konstantinos N. Plataniotis,et al.  Face recognition using kernel direct discriminant analysis algorithms , 2003, IEEE Trans. Neural Networks.

[27]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[28]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[29]  Jieping Ye,et al.  Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..