The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift

Online learning algorithms often have to operate in the presence of concept drift (i.e., the concepts to be learned can change with time). This paper presents a new categorization for concept drift, separating drifts according to different criteria into mutually exclusive and nonheterogeneous categories. Moreover, although ensembles of learning machines have been used to learn in the presence of concept drift, there has been no deep study of why they can be helpful for that and which of their features can contribute or not for that. As diversity is one of these features, we present a diversity analysis in the presence of different types of drifts. We show that, before the drift, ensembles with less diversity obtain lower test errors. On the other hand, it is a good strategy to maintain highly diverse ensembles to obtain lower test errors shortly after the drift independent on the type of drift, even though high diversity is more important for more severe drifts. Longer after the drift, high diversity becomes less important. Diversity by itself can help to reduce the initial increase in error caused by a drift, but does not provide the faster recovery from drifts in long-term.

[1]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[2]  Haibo He,et al.  IMORL: Incremental Multiple-Object Recognition and Localization , 2008, IEEE Transactions on Neural Networks.

[3]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[4]  H. Inoue,et al.  Self-organizing neural grove and its applications , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[5]  Plamen Angelov,et al.  Nature-Inspired Methods for Knowledge Generation from Data in Real-Time , 2006 .

[6]  Jürgen Branke,et al.  Optimization in Dynamic Environments , 2002 .

[7]  J. Mauchly Significance Test for Sphericity of a Normal $n$-Variate Distribution , 1940 .

[8]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[9]  Antonio Barrientos,et al.  Two adaptive mutation operators for optima tracking in dynamic optimization problems with evolution strategies , 2007, GECCO '07.

[10]  Raj Bhatnagar,et al.  Tracking recurrent concept drift in streaming data using ensemble classifiers , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[11]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[12]  Naonori Ueda,et al.  Generalization error of ensemble estimators , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Xin Yao,et al.  Evolving hybrid ensembles of learning machines for better generalisation , 2006, Neurocomputing.

[16]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[17]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[18]  Leo Breiman,et al.  Pasting Small Votes for Classification in Large Databases and On-Line , 1999, Machine Learning.

[19]  Ludmila I. Kuncheva,et al.  A framework for generating data to simulate changing environments , 2007, Artificial Intelligence and Applications.

[20]  Stuart J. Russell,et al.  Experimental comparisons of online and batch versions of bagging and boosting , 2001, KDD '01.

[21]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[23]  L MinkuLeandro,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010 .

[24]  A. Sima Etaner-Uyar,et al.  Towards an analysis of dynamic environments , 2005, GECCO '05.

[25]  Harris Drucker,et al.  Improving Performance in Neural Networks Using a Boosting Algorithm , 1992, NIPS.

[26]  Ronald L. Rivest,et al.  Learning Time-Varying Concepts , 1990, NIPS.

[27]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[28]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[29]  Ralf Klinkenberg,et al.  An Ensemble Classifier for Drifting Concepts , 2005 .

[30]  D. C. Howell Statistical Methods for Psychology , 1987 .

[31]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[32]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[33]  Kyosuke Nishida,et al.  Adaptive Classifiers-Ensemble System for Tracking Concept Drift , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[34]  Norbert Jankowski,et al.  New developments in the Feature Space Mapping model , 2000 .

[35]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[36]  Jiawei Han,et al.  On Appropriate Assumptions to Mine Data Streams: Analysis and Practice , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[37]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[38]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[39]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[40]  Niall M. Adams,et al.  The impact of changing populations on classifier performance , 1999, KDD '99.

[41]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[42]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[43]  Hirotaka Inoue,et al.  Effective Pruning Method for a Multiple Classifier System Based on Self-Generating Neural Networks , 2003, ICANN.

[44]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[45]  Giandomenico Spezzano,et al.  An Adaptive Distributed Ensemble Approach to Mine Concept-Drifting Data Streams , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[46]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[47]  Hirotaka Inoue,et al.  Improving Generalization Ability of Self-Generating Neural Networks Through Ensemble Averaging , 2000, PAKDD.

[48]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[49]  Frank Kirchner,et al.  Performance evaluation of EANT in the robocup keepaway benchmark , 2007, ICMLA 2007.

[50]  Chee Peng Lim,et al.  Online pattern classification with multiple neural network systems: an experimental study , 2003, IEEE Trans. Syst. Man Cybern. Part C.

[51]  Herman Aguinis,et al.  Cautionary Note on Reporting Eta-Squared Values from Multifactor ANOVA Designs , 2004 .

[52]  Xin Yao,et al.  On-line bagging Negative Correlation Learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[53]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.

[54]  S. Geisser,et al.  On methods in the analysis of profile data , 1959 .

[55]  Xin Yao,et al.  Selective negative correlation learning approach to incremental learning , 2009, Neurocomputing.

[56]  JefI’rty C. Schlirrlrrer Beyond incremental processing : Tracking concept drift , 1999 .

[57]  Ralf Klinkenberg,et al.  Boosting classifiers for drifting concepts , 2007, Intell. Data Anal..

[58]  Huanhuan Chen,et al.  Regularized Negative Correlation Learning for Neural Network Ensembles , 2009, IEEE Transactions on Neural Networks.

[59]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[60]  Robert Givan,et al.  Online Ensemble Learning: An Empirical Study , 2000, Machine Learning.

[61]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[62]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[63]  George Forman,et al.  Tackling concept drift by temporal inductive transfer , 2006, SIGIR.

[64]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[65]  Mykola Pechenizkiy,et al.  Handling Local Concept Drift with Dynamic Integration of Classifiers: Domain of Antibiotic Resistance in Nosocomial Infections , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[66]  Xin Yao,et al.  Evolutionary ensembles with negative correlation learning , 2000, IEEE Trans. Evol. Comput..

[67]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[68]  Gavin Brown,et al.  Bayesian estimation of rule accuracy in UCS , 2007, GECCO '07.

[69]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[70]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[71]  David B. Skillicorn,et al.  Streaming Random Forests , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[72]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[73]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[74]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[75]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[76]  Nikola K. Kasabov,et al.  DENFIS: dynamic evolving neural-fuzzy inference system and its application for time-series prediction , 2002, IEEE Trans. Fuzzy Syst..

[77]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[78]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.

[79]  Ian Witten,et al.  Data Mining , 2000 .

[80]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[81]  Donald A. Berry,et al.  Bandit Problems: Sequential Allocation of Experiments. , 1986 .

[82]  Ana I. González Acuña An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization , 2012 .

[83]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[84]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[85]  Xin Yao,et al.  An Improved Constructive Neural Network Ensemble Approach to Medical Diagnoses , 2004, IDEAL.

[86]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[87]  Xin Yao,et al.  Negative correlation in incremental learning , 2007, Natural Computing.

[88]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[89]  João Gama,et al.  Tracking Recurring Concepts with Meta-learners , 2009, EPIA.

[90]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[91]  Wei Fan StreamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams , 2004, VLDB.

[92]  Xin Yao,et al.  Using diversity to handle concept drift in on-line learning , 2009, 2009 International Joint Conference on Neural Networks.

[93]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[94]  Juan Julián Merelo Guervós,et al.  A genetic algorithm for dynamic modelling and prediction of activity in document streams , 2007, GECCO '07.

[95]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[96]  David C. Rine,et al.  Agent Technologies and Web Engineering: Applications and Systems , 2008 .

[97]  Carlo Zaniolo,et al.  Fast and Light Boosting for Adaptive Mining of Data Streams , 2004, PAKDD.

[98]  Wei Fan,et al.  Systematic data selection to mine concept-drifting data streams , 2004, KDD.

[99]  Jürgen Branke,et al.  Evolutionary optimization in uncertain environments-a survey , 2005, IEEE Transactions on Evolutionary Computation.

[100]  Xin Yao,et al.  Simultaneous training of negatively correlated neural networks in an ensemble , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[101]  Xin Yao,et al.  A constructive algorithm for training cooperative neural network ensembles , 2003, IEEE Trans. Neural Networks.

[102]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[103]  Lutz Prechelt,et al.  A Set of Neural Network Benchmark Problems and Benchmarking Rules , 1994 .

[104]  Gavin Brown,et al.  Diversity in neural network ensembles , 2004 .

[105]  John A. Bullinaria,et al.  Evolving improved incremental learning schemes for neural network systems , 2005, 2005 IEEE Congress on Evolutionary Computation.

[106]  J. C. Schlimmer,et al.  Incremental learning from noisy data , 2004, Machine Learning.

[107]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[108]  Kenneth O. Stanley Learning Concept Drift with a Committee of Decision Trees , 2003 .

[109]  Douglas H. Fisher,et al.  A Case Study of Incremental Concept Induction , 1986, AAAI.

[110]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[111]  Jürgen Branke,et al.  Evolutionary Optimization in Dynamic Environments , 2001, Genetic Algorithms and Evolutionary Computation.

[112]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[113]  Huan Liu,et al.  Learning a Neural Tree , 1992 .

[114]  F. Yates Contingency Tables Involving Small Numbers and the χ2 Test , 1934 .

[115]  Huanhuan Chen,et al.  Trade-Off Between Diversity and Accuracy in Ensemble Generation , 2006, Multi-Objective Machine Learning.

[116]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[117]  Nikola K. Kasabov,et al.  Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning , 2001, IEEE Trans. Syst. Man Cybern. Part B.