Discriminant Learning Analysis

Linear discriminant analysis (LDA) as a dimension reduction method is widely used in classification such as face recognition. However, it suffers from the small sample size (SSS) problem when data dimensionality is greater than the sample size, as in images where features are high dimensional and correlated. In this paper, we propose to address the SSS problem in the framework of statistical learning theory. We compute linear discriminants by regularized least squares regression, where the singularity problem is resolved. The resulting discriminants are complete in that they include both regular and irregular information. We show that our proposal and its nonlinear extension belong to the same framework where powerful classifiers such as support vector machines are formulated. In addition, our approach allows us to establish an error bound for LDA. Finally, our experiments validate our theoretical analysis results.

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  M. Bartlett Further aspects of the theory of multiple regression , 1938, Mathematical Proceedings of the Cambridge Philosophical Society.

[3]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[4]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[5]  G. Stewart Introduction to matrix computations , 1973 .

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Sarunas Raudys,et al.  On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[9]  Qi Tian,et al.  Image Classification By The Foley-Sammon Transform , 1986 .

[10]  J. Friedman Regularized Discriminant Analysis , 1989 .

[11]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[12]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Jing-Yu Yang,et al.  A generalized optimal set of discriminant vectors , 1992, Pattern Recognit..

[14]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[15]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[16]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[20]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[21]  Ja-Chen Lin,et al.  A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[22]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[23]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[24]  Hyeonjoon Moon,et al.  The FERET Evaluation Methodology for Face-Recognition Algorithms , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[26]  Robert P. W. Duin,et al.  Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[28]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[30]  Hanqing Lu,et al.  Solving the small sample size problem of LDA , 2002, Object recognition supported by user interaction for service robots.

[31]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[32]  Konstantinos N. Plataniotis,et al.  Face recognition using kernel direct discriminant analysis algorithms , 2003, IEEE Trans. Neural Networks.

[33]  Sebastian Mika,et al.  Kernel Fisher Discriminants , 2003 .

[34]  Jing Peng,et al.  Kernel Pooled Local Subspaces for Classification , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[35]  Stavros J. Perantonis,et al.  On the relation between discriminant analysis and mutual information for supervised linear feature extraction , 2004, Pattern Recognit..

[36]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.

[37]  Rich Caruana,et al.  An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics , 2005 .

[38]  Jian Yang,et al.  KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Aleix M. Martínez,et al.  Where are linear feature extraction methods applicable? , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[41]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[42]  Aleix M. Martínez,et al.  Selecting Principal Components in a Two-Stage LDA Algorithm , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[43]  Chia-Wei Hsu,et al.  A Linear Feature Extraction for Multiclass Classification Problems Based on Class Mean and Covariance Discriminant Information , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Tao Jiang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[45]  Lorenzo Rosasco,et al.  On regularization algorithms in learning theory , 2007, J. Complex..

[46]  Anastasios Tefas,et al.  Weighted Piecewise LDA for Solving the Small Sample Size Problem in Face Verification , 2007, IEEE Transactions on Neural Networks.

[47]  Jian Yang,et al.  Globally Maximizing, Locally Minimizing: Unsupervised Discriminant Projection with Applications to Face and Palm Biometrics , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Jean-Francois Mangin,et al.  Classification Based on Cortical Folding Patterns , 2007, IEEE Transactions on Medical Imaging.

[49]  Lorenzo Rosasco,et al.  Dimensionality reduction and generalization , 2007, ICML '07.

[50]  Honggang Zhang,et al.  Comments on "Globally Maximizing, Locally Minimizing: Unsupervised Discriminant Projection with Application to Face and Palm Biometrics" , 2007, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Aleix M. Martínez,et al.  Pruning Noisy Bases in Discriminant Analysis , 2008, IEEE Transactions on Neural Networks.