Diversity exploration and negative correlation learning on imbalanced data sets

Class imbalance learning is an important research area in machine learning, where instances in some classes heavily outnumber the instances in other classes. This unbalanced class distribution causes performance degradation. Some ensemble solutions have been proposed for the class imbalance problem. Diversity has been proved to be an influential aspect in ensemble learning, which describes the degree of different decisions made by classifiers. However, none of those proposed solutions explore the impact of diversity on imbalanced data sets. In addition, most of them are based on re-sampling techniques to rebalance class distribution, and over-sampling usually causes overfitting (high generalisation error). This paper investigates if diversity can relieve this problem by using negative correlation learning (NCL) model, which encourages diversity explicitly by adding a penalty term in the error function of neural networks. A variation model of NCL is also proposed - NCLCost. Our study shows that diversity has a direct impact on the measure of recall. It is also a factor that causes the reduction of F-measure. In addition, although NCL-based models with extreme settings do not produce better recall values of minority class than SMOTEBoost [1], they have slightly better performance of F-measure and G-mean than both independent ANNs and SMOTEBoost and better recall than independent ANNs.

[1]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[2]  G. Yule,et al.  On the association of attributes in statistics, with examples from the material of the childhood society, &c , 1900, Proceedings of the Royal Society of London.

[3]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[4]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[5]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[6]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[7]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[8]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[9]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[10]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[11]  Cen Li,et al.  Classifying imbalanced data using a bagging ensemble variation (BEV) , 2007, ACM-SE 45.

[12]  Rosa Maria Valdovinos,et al.  The Imbalanced Training Sample Problem: Under or over Sampling? , 2004, SSPR/SPR.

[13]  GuoHongyu,et al.  Learning from imbalanced data sets with boosting and data generation , 2004 .

[14]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[15]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[18]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[19]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[20]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[21]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[22]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[23]  Xin Yao,et al.  Ensemble Learning Using Multi-Objective Evolutionary Algorithms , 2006, J. Math. Model. Algorithms.

[24]  Yong Zhao,et al.  All Zero Block Detection Based on Statistics for AVS-M Intra Frame Prediction , 2008, 2008 International Symposium on Intelligent Information Technology Application Workshops.

[25]  D.S. Anyfantis,et al.  Local cost sensitive learning for handling imbalanced data sets , 2007, 2007 Mediterranean Conference on Control & Automation.