Null space versus orthogonal linear discriminant analysis

Dimensionality reduction is an important pre-processing step for many applications. Linear Discriminant Analysis (LDA) is one of the well known methods for supervised dimensionality reduction. However, the classical LDA formulation requires the nonsingularity of scatter matrices involved. For undersampled problems, where the data dimension is much larger than the sample size, all scatter matrices are singular and classical LDA fails. Many extensions, including null space based LDA (NLDA), orthogonal LDA (OLDA), etc, have been proposed in the past to overcome this problem. In this paper, we present a computational and theoretical analysis of NLDA and OLDA. Our main result shows that under a mild condition which holds in many applications involving high-dimensional data, NLDA is equivalent to OLDA. We have performed extensive experiments on various types of data and results are consistent with our theoretical analysis. The presented analysis and experimental results provide further insight into several LDA based algorithms.

[1]  Jieping Ye,et al.  An optimization criterion for generalized discriminant analysis on undersampled problems , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[3]  Hanqing Lu,et al.  Solving the small sample size problem of LDA , 2002, Object recognition supported by user interaction for service robots.

[4]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[5]  Konstantinos N. Plataniotis,et al.  Face recognition using kernel direct discriminant analysis algorithms , 2003, IEEE Trans. Neural Networks.

[6]  Ja-Chen Lin,et al.  A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[7]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[8]  Jian Yang,et al.  KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[10]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  J. S. Marron,et al.  Geometric representation of high dimension, low sample size data , 2005 .

[12]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[13]  John W. Sammon,et al.  An Optimal Set of Discriminant Vectors , 1975, IEEE Transactions on Computers.

[14]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Gene H. Golub,et al.  Matrix computations , 1983 .

[16]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[17]  W. V. McCarthy,et al.  Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data , 1995 .

[18]  Jieping Ye,et al.  Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..

[19]  David G. Stork,et al.  Pattern Classification , 1973 .

[20]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[21]  J. Friedman Regularized Discriminant Analysis , 1989 .

[22]  L. Duchene,et al.  An Optimal Transformation for Discriminant and Principal Component Analysis , 1988, IEEE Trans. Pattern Anal. Mach. Intell..