Analysis of Chernoff criterion for linear dimensionality reduction

Well known linear discriminant analysis (LDA) based on the Fisher criterion is incapable of dealing with heteroscedasticity in data. However, in many practical applications we often encounter heteroscedastic data, i.e., within class scatter matrices can not be expected to be equal. A technique based on the Chernoff criterion for linear dimensionality reduction has been proposed recently. The technique extends well-known Fisher's LDA and is capable of exploiting information about heteroscedasticity in the data. While the Chernoff criterion has been shown to outperform the Fisher's, a clear understanding of its exact behavior is lacking. In addition, the criterion, as introduced, is rather complex, thereby making it difficult to clearly state its relationship to other linear dimensionality techniques. In this paper, we show precisely what can be expected from the Chernoff criterion and its relations to the Fisher criterion and Fukunaga-Koontz transform. Furthermore, we show that a recently proposed decomposition of the data space into four subspaces is incomplete. We provide arguments on how to best enrich the decomposition of the data space in order to account for heteroscedasticity in the data.

[1]  C. T. Ng,et al.  Measures of distance between probability distributions , 1989 .

[2]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[3]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[4]  Luis Rueda,et al.  Linear dimensionality reduction by maximizing the Chernoff distance in the transformed space , 2008, Pattern Recognit..

[5]  Rama Chellappa,et al.  Face recognition using discriminant eigenvectors , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Xiangyang Xue,et al.  Optimal dimensionality of metric space for classification , 2007, ICML '07.

[7]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[8]  Dapeng Wu,et al.  A RELIEF Based Feature Extraction Algorithm , 2008, SDM.

[9]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[10]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[11]  H. P. Decell,et al.  Feature combinations and the divergence criterion , 1977 .

[12]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[13]  Haesun Park,et al.  Generalizing discriminant analysis using the generalized singular value decomposition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Robert P. W. Duin,et al.  Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[16]  C. H. Chen,et al.  On information and distance measures, error bounds, and feature selection , 1976, Inf. Sci..

[17]  Terence Sim,et al.  Discriminant Subspace Analysis: A Fukunaga-Koontz Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Aleix M. Martínez,et al.  Bayes Optimality in Linear Discriminant Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  M. Loog Approximate Pairwise Accuracy Criteria for Multiclass Linear Dimension Reduction: Generalisations of the Fisher Criterion , 1999 .

[20]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Keinosuke Fukunaga,et al.  Application of the Karhunen-Loève Expansion to Feature Selection and Ordering , 1970, IEEE Trans. Computers.

[22]  Hiroshi Murase,et al.  Parametric Feature Detection , 1996, International Journal of Computer Vision.

[23]  Jieping Ye,et al.  Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..

[24]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[25]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Michael Elad,et al.  Optimal reduced-rank quadratic classifiers using the Fukunaga-Koontz transform with applications to automated target recognition , 2003, SPIE Defense + Commercial Sensing.

[27]  Timothy F. Cootes,et al.  Active Shape Models - 'smart snakes' , 1992, BMVC.