Online Class Imbalance Learning and its Applications in Fault Detection

Although class imbalance learning and online learning have been extensively studied in the literature separately, online class imbalance learning that considers the challenges of both fields has not drawn much attention. It deals with data streams having very skewed class distributions, such as fault diagnosis of real-time control monitoring systems and intrusion detection in computer networks. To fill in this research gap and contribute to a wide range of real-world applications, this paper first formulates online class imbalance learning problems. Based on the problem formulation, a new online learning algorithm, sampling-based online bagging (SOB), is proposed to tackle class imbalance adaptively. Then, we study how SOB and other state-of-the-art methods can benefit a class of fault detection data under various scenarios and analyze their performance in depth. Through extensive experiments, we find that SOB can balance the performance between classes very well across different data domains and produce stable G-mean when learning constantly imbalanced data streams, but it is sensitive to sudden changes in class imbalance, in which case SOB's predecessor undersampling-based online bagging (UOB) is more robust.

[1]  Xin Yao,et al.  Finding Robust Solutions to Dynamic Optimization Problems , 2013, EvoApplications.

[2]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.

[3]  Zhiping Lin,et al.  Weighted Online Sequential Extreme Learning Machine for Class Imbalance Learning , 2013, Neural Processing Letters.

[4]  Haibo He,et al.  MuSeRA: Multiple Selectively Recursive Approach towards imbalanced stream data mining , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[5]  Koichiro Yamauchi,et al.  Detecting sudden concept drift with knowledge of human behavior , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[6]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Hadi Sadoghi Yazdi,et al.  Recursive least square perceptron model for non-stationary and imbalanced data stream classification , 2013, Evol. Syst..

[9]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[10]  Philip S. Yu,et al.  Mining Concept-Drifting Data Streams , 2010, Data Mining and Knowledge Discovery Handbook.

[11]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[12]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[13]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[14]  James Bailey,et al.  New Frontiers in Applied Data Mining , 2011, Lecture Notes in Computer Science.

[15]  Haibo He,et al.  Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach , 2011, Evol. Syst..

[16]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[17]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[18]  Vassilis Plachouras,et al.  Online learning from click data for sponsored search , 2008, WWW.

[19]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[20]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[21]  Huanhuan Chen,et al.  Negative correlation learning for classification ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[22]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[23]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[24]  Philip S. Yu,et al.  Classifying Data Streams with Skewed Class Distributions and Concept Drifts , 2008, IEEE Internet Computing.

[25]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[26]  Peter Tiño,et al.  Concept drift detection for online class imbalance learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[27]  Marios M. Polycarpou,et al.  Adaptive Approximation for Multiple Sensor Fault Detection and Isolation of Nonlinear Uncertain Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Yang Zhang,et al.  Mining Data Streams with Skewed Distribution by Static Classifier Ensemble , 2009 .

[29]  Xin Yao,et al.  Dynamic Sampling Approach to Training Neural Networks for Multiclass Imbalance Classification , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Haibo He,et al.  SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining , 2009, 2009 International Joint Conference on Neural Networks.

[31]  Xin Yao,et al.  A learning framework for online class imbalance learning , 2013, 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL).

[32]  Shuo Wang,et al.  Ensemble diversity for class imbalance learning , 2011 .

[33]  Hien M. Nguyen,et al.  Online learning from imbalanced data streams , 2011, 2011 International Conference of Soft Computing and Pattern Recognition (SoCPaR).

[34]  Russel Pears,et al.  Synthetic Minority Over-sampling TEchnique (SMOTE) for Predicting Software Build Outcomes , 2014, SEKE.

[35]  Nitesh V. Chawla,et al.  Adaptive Methods for Classification in Arbitrarily Imbalanced and Drifting Data Streams , 2009, PAKDD Workshops.

[36]  CoyleLorcan,et al.  A case-based technique for tracking concept drift in spam filtering , 2005 .

[37]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[38]  Taghi M. Khoshgoftaar,et al.  Improving Learner Performance with Data Sampling and Boosting , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[39]  Marios M. Polycarpou,et al.  Contaminant Event Monitoring in Intelligent Buildings Using a Multi-Zone Formulation , 2012 .

[40]  Markus Timusk,et al.  Feature extraction for novelty detection as applied to fault detection in machinery , 2011, Pattern Recognit. Lett..

[41]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[42]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[43]  Padraig Cunningham,et al.  A case-based technique for tracking concept drift in spam filtering , 2004, Knowl. Based Syst..

[44]  Gregory Ditzler,et al.  Incremental Learning of Concept Drift from Streaming Imbalanced Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[45]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[46]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[47]  Tommi S. Jaakkola,et al.  Online Learning of Non-stationary Sequences , 2003, NIPS.

[48]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[49]  Xin Yao,et al.  The Effectiveness of a New Negative Correlation Learning Algorithm for Classification Ensembles , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[50]  Cagatay Catal,et al.  Software fault prediction: A literature review and current trends , 2011, Expert Syst. Appl..

[51]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.