Negative correlation in incremental learning

Negative Correlation Learning (NCL) has been successfully applied to construct neural network ensembles. It encourages the neural networks that compose the ensemble to be different from each other and, at the same time, accurate. The difference among the neural networks that compose an ensemble is a desirable feature to perform incremental learning, for some of the neural networks can be able to adapt faster and better to new data than the others. So, NCL is a potentially powerful approach to incremental learning. With this in mind, this paper presents an analysis of NCL, aiming at determining its weak and strong points to incremental learning. The analysis shows that it is possible to use NCL to overcome catastrophic forgetting, an important problem related to incremental learning. However, when catastrophic forgetting is very low, no advantage of using more than one neural network of the ensemble to learn new data is taken and the test error is high. When all the neural networks are used to learn new data, some of them can indeed adapt better than the others, but a higher catastrophic forgetting is obtained. In this way, it is important to find a trade-off between overcoming catastrophic forgetting and using an entire ensemble to learn new data. The NCL results are comparable with other approaches which were specifically designed to incremental learning. Thus, the study presented in this work reveals encouraging results with negative correlation in incremental learning, showing that NCL is a promising approach to incremental learning.

[1]  Nikola K. Kasabov,et al.  Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[2]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[3]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[4]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[5]  Alan F. Murray,et al.  International Joint Conference on Neural Networks , 1993 .

[6]  John A. Bullinaria,et al.  Evolving improved incremental learning schemes for neural network systems , 2005, 2005 IEEE Congress on Evolutionary Computation.

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[8]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[11]  Stephen Grossberg,et al.  ARTMAP: supervised real-time learning and classification of nonstationary data by a self-organizing neural network , 1991, [1991 Proceedings] IEEE Conference on Neural Networks for Ocean Engineering.

[12]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[13]  Gavin Brown,et al.  Diversity in neural network ensembles , 2004 .

[14]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[15]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[16]  Hirotaka Inoue,et al.  Improving Generalization Ability of Self-Generating Neural Networks Through Ensemble Averaging , 2000, PAKDD.

[17]  Xin Yao,et al.  Simultaneous training of negatively correlated neural networks in an ensemble , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[18]  Xin Yao,et al.  A constructive algorithm for training cooperative neural network ensembles , 2003, IEEE Trans. Neural Networks.

[19]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[20]  Ian Witten,et al.  Data Mining , 2000 .

[21]  H. Inoue,et al.  Self-organizing neural grove and its applications , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[22]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[23]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[24]  Huan Liu,et al.  Learning a Neural Tree , 1992 .

[25]  Xin Yao,et al.  An Improved Constructive Neural Network Ensemble Approach to Medical Diagnoses , 2004, IDEAL.

[26]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[27]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[28]  Xin Yao,et al.  Evolving hybrid ensembles of learning machines for better generalisation , 2006, Neurocomputing.

[29]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[30]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[31]  Teresa Bernarda Ludermir,et al.  Design of experiments in neuro-fuzzy systems , 2005, Fifth International Conference on Hybrid Intelligent Systems (HIS'05).

[32]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[33]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[34]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[35]  Hirotaka Inoue,et al.  Effective Pruning Method for a Multiple Classifier System Based on Self-Generating Neural Networks , 2003, ICANN.

[36]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[37]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[38]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[39]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[40]  Norbert Jankowski,et al.  New developments in the Feature Space Mapping model , 2000 .

[41]  Huanhuan Chen,et al.  Trade-Off Between Diversity and Accuracy in Ensemble Generation , 2006, Multi-Objective Machine Learning.

[42]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[43]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .