A multi-objective ensemble method for online class imbalance learning

Online class imbalance learning is an emerging learning area that combines the challenges of both online learning and class imbalance learning. In addition to the learning difficulty from the imbalanced distribution, another major challenge is that the imbalanced rate in a data stream can be dynamically changing. OOB and UOB are two state-of-the-art methods for online class imbalance problems [1]. UOB is better at recognizing minority-class examples when the imbalance rate does not change much over time, while OOB is more prepared for the case with a dynamic rate. Aiming for an effective method for both static and dynamic cases, this paper proposes a multi-objective ensemble method MOSOB that combines OOB and UOB. MOSOB finds the Pareto-optimal weights for OOB and UOB at each time step, to maximize minority-class recall and majority-class recall simultaneously. Experiments on five real-world data applications show that MOSOB performs well in both static and dynamic data streams. Furthermore, we look into its performance on a group of highly imbalanced data streams. To respond to the minority class within 10000 time steps, the imbalance rate can be as low as 0.1% for easy data streams; at least 3% of imbalance rate is required to classify difficult data streams.

[1]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[2]  Gregory Ditzler,et al.  Incremental Learning of Concept Drift from Streaming Imbalanced Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[3]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[4]  Zhiping Lin,et al.  Weighted Online Sequential Extreme Learning Machine for Class Imbalance Learning , 2013, Neural Processing Letters.

[5]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[6]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[8]  Chaim Linhart,et al.  PAKDD Data Mining Competition 2009: New Ways of Using Known Methods , 2009, PAKDD Workshops.

[9]  Hadi Sadoghi Yazdi,et al.  Recursive least square perceptron model for non-stationary and imbalanced data stream classification , 2013, Evol. Syst..

[10]  Haibo He,et al.  Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach , 2011, Evol. Syst..

[11]  Harold W. Kuhn,et al.  Nonlinear programming: a historical view , 1982, SMAP.

[12]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[13]  Peter J. Fleming,et al.  An Overview of Evolutionary Algorithms in Multiobjective Optimization , 1995, Evolutionary Computation.

[14]  Peter Tiño,et al.  Concept drift detection for online class imbalance learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[15]  Xin Yao,et al.  Online Class Imbalance Learning and its Applications in Fault Detection , 2013, Int. J. Comput. Intell. Appl..

[16]  Marios M. Polycarpou,et al.  Contaminant Event Monitoring in Intelligent Buildings Using a Multi-Zone Formulation , 2012 .

[17]  Xin Yao,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Relationships between Diversity of Classification Ensembles and Single-class Performance Measures , 2022 .

[18]  Haibo He,et al.  MuSeRA: Multiple Selectively Recursive Approach towards imbalanced stream data mining , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[19]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[20]  Xin Yao,et al.  A learning framework for online class imbalance learning , 2013, 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL).

[21]  Hien M. Nguyen,et al.  Online learning from imbalanced data streams , 2011, 2011 International Conference of Soft Computing and Pattern Recognition (SoCPaR).

[22]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[23]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[24]  A. Dawid,et al.  Prequential probability: principles and properties , 1999 .

[25]  Paolo Soda,et al.  A multi-objective optimisation approach for class imbalance learning , 2011, Pattern Recognit..