DDD: A New Ensemble Approach for Dealing with Concept Drift

Online learning algorithms often have to operate in the presence of concept drifts. A recent study revealed that different diversity levels in an ensemble of learning machines are required in order to maintain high generalization on both old and new concepts. Inspired by this study and based on a further study of diversity with different strategies to deal with drifts, we propose a new online ensemble learning approach called Diversity for Dealing with Drifts (DDD). DDD maintains ensembles with different diversity levels and is able to attain better accuracy than other approaches. Furthermore, it is very robust, outperforming other drift handling approaches in terms of accuracy when there are false positive drift detections. In all the experimental comparisons we have carried out, DDD always performed at least as well as other drift handling approaches under various conditions, with very few exceptions.

[1]  Xin Yao,et al.  Using diversity to handle concept drift in on-line learning , 2009, 2009 International Joint Conference on Neural Networks.

[2]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[3]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[4]  Jiawei Han,et al.  On Appropriate Assumptions to Mine Data Streams: Analysis and Practice , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  A. Dawid,et al.  Prequential probability: principles and properties , 1999 .

[6]  Kenneth O. Stanley Learning Concept Drift with a Committee of Decision Trees , 2003 .

[7]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[8]  JefI’rty C. Schlirrlrrer Beyond incremental processing : Tracking concept drift , 1999 .

[9]  MSc Frsnz Nikola Kasabov PhD Evolving Connectionist Systems , 2003, Perspectives in Neural Computing.

[10]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[11]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[12]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[13]  Yong Zhao,et al.  All Zero Block Detection Based on Statistics for AVS-M Intra Frame Prediction , 2008, 2008 International Symposium on Intelligent Information Technology Application Workshops.

[14]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[15]  David B. Skillicorn,et al.  Streaming Random Forests , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[16]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[17]  Ludmila I. Kuncheva,et al.  A framework for generating data to simulate changing environments , 2007, Artificial Intelligence and Applications.

[18]  G. Yule,et al.  On the association of attributes in statistics, with examples from the material of the childhood society, &c , 1900, Proceedings of the Royal Society of London.

[19]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[20]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[21]  Carlo Zaniolo,et al.  Fast and Light Boosting for Adaptive Mining of Data Streams , 2004, PAKDD.

[22]  Frank Kirchner,et al.  Performance evaluation of EANT in the robocup keepaway benchmark , 2007, ICMLA 2007.

[23]  Xin Yao,et al.  Negative correlation in incremental learning , 2007, Natural Computing.

[24]  Kyosuke Nishida,et al.  Learning and Detecting Concept Drift , 2008 .

[25]  Robert Givan,et al.  Online Ensemble Learning: An Empirical Study , 2000, Machine Learning.

[26]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[27]  Kyosuke Nishida,et al.  Adaptive Classifiers-Ensemble System for Tracking Concept Drift , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[28]  Ralf Klinkenberg,et al.  Boosting classifiers for drifting concepts , 2007, Intell. Data Anal..

[29]  Xin Yao,et al.  On-line bagging Negative Correlation Learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[30]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.

[31]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[32]  Stuart J. Russell,et al.  Experimental comparisons of online and batch versions of bagging and boosting , 2001, KDD '01.

[33]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[34]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[35]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[36]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[37]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.

[38]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[39]  Raj Bhatnagar,et al.  Tracking recurrent concept drift in streaming data using ensemble classifiers , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[40]  Ralf Klinkenberg,et al.  An Ensemble Classifier for Drifting Concepts , 2005 .

[41]  Haibo He,et al.  IMORL: Incremental Multiple-Object Recognition and Localization , 2008, IEEE Transactions on Neural Networks.

[42]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.