Theoretical Study of the Relationship between Diversity and Single-Class Measures for Class Imbalance Learning

This paper presents the theoretical research about the relationship between diversity of classification ensembles and single-class measures that are commonly used in class imbalance learning. Although there have been studies on diversity and its links to overall ensemble accuracy, little work has been done on the impact of diversity on single-class performance measures in class imbalance learning. The study of class imbalance learning is important, because many real-world problems, such as those in medical diagnosis, fraud detection, condition monitoring, etc., have imbalanced classes, where a minority class is usually more important and interesting than the majority class. In order to gain a deeper understanding of ensemble learning for imbalanced classes, this paper studies the impact of diversity on single-class performance measures theoretically and empirically. One of the main objectives of this paper is to find out if and when ensemble diversity can improve the classification performance on the important (minority) class.

[1]  G. Yule On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c , 1900 .

[2]  Xin Yao,et al.  Diversity exploration and negative correlation learning on imbalanced data sets , 2009, 2009 International Joint Conference on Neural Networks.

[3]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[4]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[5]  Robert P. W. Duin,et al.  Limits on the majority vote accuracy in classifier fusion , 2003, Pattern Analysis & Applications.

[6]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[7]  Naonori Ueda,et al.  Generalization error of ensemble estimators , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[8]  Ludmila I. Kuncheva,et al.  That Elusive Diversity in Classifier Ensembles , 2003, IbPRIA.

[9]  Kagan Tumer,et al.  Linear and Order Statistics Combiners for Pattern Classification , 1999, ArXiv.

[10]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[11]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[12]  G. Yule,et al.  On the association of attributes in statistics, with examples from the material of the childhood society, &c , 1900, Proceedings of the Royal Society of London.

[13]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[14]  Cen Li,et al.  Classifying imbalanced data using a bagging ensemble variation (BEV) , 2007, ACM-SE 45.

[15]  C. J. Whitaker,et al.  Ten measures of diversity in classifier ensembles: limits for two classifiers , 2001 .

[16]  Nitesh V. Chawla,et al.  Exploiting Diversity in Ensembles: Improving the Performance on Unbalanced Datasets , 2007, MCS.

[17]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.