Resampling-Based Ensemble Methods for Online Class Imbalance Learning

Online class imbalance learning is a new learning problem that combines the challenges of both online learning and class imbalance learning. It deals with data streams having very skewed class distributions. This type of problems commonly exists in real-world applications, such as fault diagnosis of real-time control monitoring systems and intrusion detection in computer networks. In our earlier work, we defined class imbalance online, and proposed two learning algorithms OOB and UOB that build an ensemble model overcoming class imbalance in real time through resampling and time-decayed metrics. In this paper, we further improve the resampling strategy inside OOB and UOB, and look into their performance in both static and dynamic data streams. We give the first comprehensive analysis of class imbalance in data streams, in terms of data distributions, imbalance rates and changes in class imbalance status. We find that UOB is better at recognizing minority-class examples in static data streams, and OOB is more robust against dynamic changes in class imbalance status. The data distribution is a major factor affecting their performance. Based on the insight gained, we then propose two new ensemble methods that maintain both OOB and UOB with adaptive weights for final predictions, called WEOB1 and WEOB2. They are shown to possess the strength of OOB and UOB with good accuracy and robustness.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Zhiping Lin,et al.  Weighted Online Sequential Extreme Learning Machine for Class Imbalance Learning , 2013, Neural Processing Letters.

[3]  Xin Yao,et al.  Online Class Imbalance Learning and its Applications in Fault Detection , 2013, Int. J. Comput. Intell. Appl..

[4]  Jerzy Stefanowski,et al.  Identification of Different Types of Minority Class Examples in Imbalanced Data , 2012, HAIS.

[5]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[6]  Haibo He,et al.  MuSeRA: Multiple Selectively Recursive Approach towards imbalanced stream data mining , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[7]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[8]  Stuart J. Russell,et al.  Experimental comparisons of online and batch versions of bagging and boosting , 2001, KDD '01.

[9]  Vicenç Puig,et al.  Fault Diagnosis Using a Timed Discrete-Event Approach Based on Interval Observers: Application to Sewer Networks , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[10]  Haibo He,et al.  Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach , 2011, Evol. Syst..

[11]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[12]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[13]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[14]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[15]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[16]  Herman Aguinis,et al.  Cautionary Note on Reporting Eta-Squared Values from Multifactor ANOVA Designs , 2004 .

[17]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[18]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  Marios M. Polycarpou,et al.  Contaminant Event Monitoring in Intelligent Buildings Using a Multi-Zone Formulation , 2012 .

[20]  S. Geisser,et al.  On methods in the analysis of profile data , 1959 .

[21]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[22]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[23]  Hadi Sadoghi Yazdi,et al.  Recursive least square perceptron model for non-stationary and imbalanced data stream classification , 2013, Evol. Syst..

[24]  Xin Yao,et al.  A learning framework for online class imbalance learning , 2013, 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL).

[25]  Hien M. Nguyen,et al.  Online learning from imbalanced data streams , 2011, 2011 International Conference of Soft Computing and Pattern Recognition (SoCPaR).

[26]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[27]  José Salvador Sánchez,et al.  An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets , 2007, CIARP.

[28]  Peter Tiño,et al.  Concept drift detection for online class imbalance learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[29]  Han Zhao,et al.  Extreme learning machine: algorithm, theory and applications , 2013, Artificial Intelligence Review.

[30]  Gregory Ditzler,et al.  Incremental Learning of Concept Drift from Streaming Imbalanced Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[31]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[32]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[33]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.