A learning framework for online class imbalance learning

Online learning has been showing to be very useful for a large number of applications in which data arrive continuously and a timely response is required. In many online cases, the data stream can have very skewed class distributions, known as class imbalance, such as fault diagnosis of realtime control monitoring systems and intrusion detection in computer networks. Classifying imbalanced data streams poses new challenges, which have attracted very little attention so far. As the first work that formally addresses this problem, this paper looks into the underlying issues, clarifies the research questions, and proposes a framework for online class imbalance learning that decomposes the learning task into three modules. Within the framework, we use a time decay function to capture the imbalance rate dynamically. Then, we propose a class imbalance detection method, in order to decide the current imbalance status in data streams. According to this information, two resampling-based online learning algorithms are developed to tackle class imbalance in data streams. Three basic types of class imbalance change are discussed in our studies. The results suggest the usefulness of the learning framework. The proposed methods are shown to be effective on both minority-class accuracy and overall performance in all three cases we considered.

[1]  Tommi S. Jaakkola,et al.  Online Learning of Non-stationary Sequences , 2003, NIPS.

[2]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[3]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[4]  Vassilis Plachouras,et al.  Online learning from click data for sponsored search , 2008, WWW.

[5]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Padraig Cunningham,et al.  A case-based technique for tracking concept drift in spam filtering , 2004, Knowl. Based Syst..

[8]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.

[9]  Xin Yao,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Relationships between Diversity of Classification Ensembles and Single-class Performance Measures , 2022 .

[10]  Haibo He,et al.  MuSeRA: Multiple Selectively Recursive Approach towards imbalanced stream data mining , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[11]  Koichiro Yamauchi,et al.  Detecting sudden concept drift with knowledge of human behavior , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[12]  Gregory Ditzler,et al.  Incremental Learning of New Classes in Unbalanced Datasets: Learn + + .UDNC , 2010, MCS.

[13]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[14]  Haibo He,et al.  Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach , 2011, Evol. Syst..

[15]  Shuo Wang,et al.  Ensemble diversity for class imbalance learning , 2011 .

[16]  Hien M. Nguyen,et al.  Online learning from imbalanced data streams , 2011, 2011 International Conference of Soft Computing and Pattern Recognition (SoCPaR).

[17]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[18]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[19]  Yang Zhang,et al.  Mining Data Streams with Skewed Distribution by Static Classifier Ensemble , 2009 .

[20]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[21]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[22]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[23]  Xin Yao,et al.  The Effectiveness of a New Negative Correlation Learning Algorithm for Classification Ensembles , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[24]  Haibo He,et al.  SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining , 2009, 2009 International Joint Conference on Neural Networks.

[25]  Nitesh V. Chawla,et al.  Adaptive Methods for Classification in Arbitrarily Imbalanced and Drifting Data Streams , 2009, PAKDD Workshops.

[26]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[27]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[28]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[29]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[30]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[31]  Huanhuan Chen,et al.  Negative correlation learning for classification ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[32]  Elisa Bertino,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010 .

[33]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[34]  A. Dawid,et al.  Prequential probability: principles and properties , 1999 .

[35]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[36]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[37]  Gregory Ditzler,et al.  Incremental Learning of Concept Drift from Streaming Imbalanced Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[38]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[39]  Philip S. Yu,et al.  Classifying Data Streams with Skewed Class Distributions and Concept Drifts , 2008, IEEE Internet Computing.