A constructive algorithm for training cooperative neural network ensembles

Presents a constructive algorithm for training cooperative neural-network ensembles (CNNEs). CNNE combines ensemble architecture design with cooperative training for individual neural networks (NNs) in ensembles. Unlike most previous studies on training ensembles, CNNE puts emphasis on both accuracy and diversity among individual NNs in an ensemble. In order to maintain accuracy among individual NNs, the number of hidden nodes in individual NNs are also determined by a constructive approach. Incremental training based on negative correlation is used in CNNE to train individual NNs for different numbers of training epochs. The use of negative correlation learning and different training epochs for training individual NNs reflect CNNEs emphasis on diversity among individual NNs in an ensemble. CNNE has been tested extensively on a number of benchmark problems in machine learning and neural networks, including Australian credit card assessment, breast cancer, diabetes, glass, heart disease, letter recognition, soybean, and Mackey-Glass time series prediction problems. The experimental results show that CNNE can produce NN ensembles with good generalization ability.

[1]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Pablo M. Granitto,et al.  A Late-Stopping Method for Optimal Aggregation of Neural Networks , 2001, Int. J. Neural Syst..

[3]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[4]  Shun-ichi Amari,et al.  Mutual information of sparsely coded associative memory with self-control and ternary neurons , 2000, Neural Networks.

[5]  David W. Opitz,et al.  Actively Searching for an Effective Neural Network Ensemble , 1996, Connect. Sci..

[6]  Bruce W. Schmeiser,et al.  Improving model accuracy using optimal linear combinations of trained neural networks , 1995, IEEE Trans. Neural Networks.

[7]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[8]  Jianxin Wu,et al.  Genetic Algorithm based Selective Neural Network Ensemble , 2001, IJCAI.

[9]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[10]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[11]  Michael J. Pazzani,et al.  Error reduction through learning multiple descriptions , 2004, Machine Learning.

[12]  James T. Kwok,et al.  Objective functions for training new hidden units in constructive neural networks , 1997, IEEE Trans. Neural Networks.

[13]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[14]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[15]  M. Pazzani,et al.  Error Reduction through Learning Multiple Descriptions , 1996, Machine Learning.

[16]  Lutz Prechelt,et al.  Some notes on neural learning algorithm benchmarking , 1995, Neurocomputing.

[17]  Farmer,et al.  Predicting chaotic time series. , 1987, Physical review letters.

[18]  A. Krogh,et al.  Statistical mechanics of ensemble learning , 1997 .

[19]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[20]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[21]  Md. Monirul Islam,et al.  Exploring constructive algorithms with stopping criteria to produce accurate and diverse individual neural networks in an ensemble , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[22]  Timur Ash,et al.  Dynamic node creation in backpropagation networks , 1989 .

[23]  Rudy Setiono,et al.  Use of a quasi-Newton method in a feedforward neural network construction algorithm , 1995, IEEE Trans. Neural Networks.

[24]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[25]  Lutz Prechelt,et al.  A quantitative study of experimental evaluations of neural network learning algorithms: Current research practice , 1996, Neural Networks.

[26]  A. Sharkey Linear and Order Statistics Combiners for Pattern Classification , 1999 .

[27]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.

[28]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[29]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[30]  Yoshio Hirose,et al.  Backpropagation algorithm which varies the number of hidden units , 1991, International 1989 Joint Conference on Neural Networks.

[31]  Noel E. Sharkey,et al.  Combining diverse neural nets , 1997, The Knowledge Engineering Review.

[32]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[33]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[34]  M. Pazzani,et al.  Learning probabilistic relational concept descriptions , 1996 .

[35]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[36]  James T. Kwok,et al.  Constructive algorithms for structure learning in feedforward neural networks for regression problems , 1997, IEEE Trans. Neural Networks.

[37]  Xin Yao,et al.  Making use of population information in evolutionary artificial neural networks , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[38]  Kazuyuki Murase,et al.  A new algorithm to design compact two-hidden-layer artificial neural networks , 2001, Neural Networks.

[39]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[40]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[41]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[42]  L. Breiman Stacked Regressions , 1996, Machine Learning.

[43]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[44]  Tamás D. Gedeon,et al.  Exploring constructive cascade networks , 1999, IEEE Trans. Neural Networks.

[45]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[46]  Harris Drucker,et al.  Boosting and Other Ensemble Methods , 1994, Neural Computation.

[47]  Xin Yao,et al.  Evolutionary ensembles with negative correlation learning , 2000, IEEE Trans. Evol. Comput..

[48]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[49]  W. Hsu,et al.  Plastic network for predicting the Mackey-Glass time series , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[50]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[51]  A. Khotanzad,et al.  Hand written digit recognition using BKS combination of neural network classifiers , 1994, Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation.

[52]  Dušan Petrovački,et al.  Evolutional development of a multilevel neural network , 1993, Neural Networks.

[53]  Xin Yao,et al.  A new evolutionary system for evolving artificial neural networks , 1997, IEEE Trans. Neural Networks.

[54]  Xin Yao,et al.  Simultaneous training of negatively correlated neural networks in an ensemble , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[55]  Amanda J. C. Sharkey,et al.  On Combining Artificial Neural Nets , 1996, Connect. Sci..

[56]  Jan C. Bioch,et al.  Classification using Bayesian neural nets , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).