An ensemble approach to data fusion and its application to ATR

Ensemble methods provide a principled framework in which to build high performance classifiers and represent many types of data. As a result, these methods can be useful for making inferences about biometric and biological events. We introduce a novel ensemble method for combining multiple representations (or views). The method is a multiple view generalization of AdaBoost. Similar to AdaBoost, base classifiers are independently built from each represetation. Unlike AdaBoost, however, all data types share the same sampling distribution computed from the base classifier having the smallest error rate among input sources. As a result, the most consistent data type dominates over time, thereby significantly reducing sensitivity to noise. The method is applied to the problem of facial and gender prediction based on biometric traits. The new method outperforms several competing techniques including kernel-based data fusion, and is provably better than AdaBoost trained on any single type of data.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[3]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[4]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[6]  Arun Ross,et al.  Multimodal biometrics: An overview , 2004, 2004 12th European Signal Processing Conference.

[7]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  L. Breiman Arcing Classifiers , 1998 .

[9]  Robert P. W. Duin,et al.  Is independence good for combining classifiers? , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[10]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[11]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[12]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[13]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[14]  P. Vaidyanathan Multirate Systems And Filter Banks , 1992 .

[15]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  G. Lugosi,et al.  On Concentration-of-Measure Inequalities , 1998 .

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[19]  Naonori Ueda,et al.  Optimal Linear Combination of Neural Networks for Improving Classification Performance , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[21]  Dmitry Panchenko,et al.  Some New Bounds on the Generalization Error of Combined Classifiers , 2000, NIPS.

[22]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.