Exploiting fisher and fukunaga-koontz transforms in chernoff dimensionality reduction

Knowledge discovery from big data demands effective representation of data. However, big data are often characterized by high dimensionality, which makes knowledge discovery more difficult. Many techniques for dimensionality reudction have been proposed, including well-known Fisher's Linear Discriminant Analysis (LDA). However, the Fisher criterion is incapable of dealing with heteroscedasticity in the data. A technique based on the Chernoff criterion for linear dimensionality reduction has been proposed that is capable of exploiting heteroscedastic information in the data. While the Chernoff criterion has been shown to outperform the Fisher's, a clear understanding of its exact behavior is lacking. In this article, we show precisely what can be expected from the Chernoff criterion. In particular, we show that the Chernoff criterion exploits the Fisher and Fukunaga-Koontz transforms in computing its linear discriminants. Furthermore, we show that a recently proposed decomposition of the data space into four subspaces is incomplete. We provide arguments on how to best enrich the decomposition of the data space in order to account for heteroscedasticity in the data. Finally, we provide experimental results validating our theoretical analysis.

[1]  Jieping Ye,et al.  A two-stage linear discriminant analysis via QR-decomposition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jieping Ye,et al.  Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..

[3]  Jiawei Han,et al.  Joint Feature Selection and Subspace Learning , 2011, IJCAI.

[4]  C. Loan Generalizing the Singular Value Decomposition , 1976 .

[5]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[6]  Hanqing Lu,et al.  Solving the small sample size problem of LDA , 2002, Object recognition supported by user interaction for service robots.

[7]  Jiawei Han,et al.  Linear Discriminant Dimensionality Reduction , 2011, ECML/PKDD.

[8]  Anastasios Tefas,et al.  Weighted Piecewise LDA for Solving the Small Sample Size Problem in Face Verification , 2007, IEEE Transactions on Neural Networks.

[9]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[10]  Hong Chang,et al.  Extending Kernel Fisher Discriminant Analysis with the Weighted Pairwise Chernoff Criterion , 2006, ECCV.

[11]  C. H. Chen,et al.  On information and distance measures, error bounds, and feature selection , 1976, Inf. Sci..

[12]  Ja-Chen Lin,et al.  A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[13]  Robert P. W. Duin,et al.  Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[15]  Jerome H. Friedman,et al.  Flexible Metric Nearest Neighbor Classification , 1994 .

[16]  Dapeng Wu,et al.  A RELIEF Based Feature Extraction Algorithm , 2008, SDM.

[17]  Dacheng Tao,et al.  Max-Min Distance Analysis by Using Sequential SDP Relaxation for Dimension Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[20]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Michael Elad,et al.  Optimal reduced-rank quadratic classifiers using the Fukunaga-Koontz transform with applications to automated target recognition , 2003, SPIE Defense + Commercial Sensing.

[22]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[23]  Xiangyang Xue,et al.  Optimal dimensionality of metric space for classification , 2007, ICML '07.

[24]  Tao Jiang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[25]  M. Saunders,et al.  Towards a Generalized Singular Value Decomposition , 1981 .

[26]  Hiroshi Murase,et al.  Parametric Feature Detection , 1996, International Journal of Computer Vision.

[27]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[28]  H. P. Decell,et al.  Feature combinations and the divergence criterion , 1977 .

[29]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[30]  Haesun Park,et al.  Generalizing discriminant analysis using the generalized singular value decomposition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Keinosuke Fukunaga,et al.  Application of the Karhunen-Loève Expansion to Feature Selection and Ordering , 1970, IEEE Trans. Computers.

[32]  C. Taylor,et al.  Active shape models - 'Smart Snakes'. , 1992 .

[33]  Daoqiang Zhang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2006, IEEE Transactions on Neural Networks.

[34]  Terence Sim,et al.  Discriminant Subspace Analysis: A Fukunaga-Koontz Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Aleix M. Martínez,et al.  Bayes Optimality in Linear Discriminant Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Chris H. Q. Ding,et al.  Linear Discriminant Analysis: New Formulations and Overfit Analysis , 2011, AAAI.

[37]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[38]  Timothy F. Cootes,et al.  Active Shape Models - 'smart snakes' , 1992, BMVC.

[39]  Xuelong Li,et al.  Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  C. T. Ng,et al.  Measures of distance between probability distributions , 1989 .

[41]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[42]  Rama Chellappa,et al.  Face recognition using discriminant eigenvectors , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[43]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[44]  Luis Rueda,et al.  Linear dimensionality reduction by maximizing the Chernoff distance in the transformed space , 2008, Pattern Recognit..