Boosting and Other Ensemble Methods

We compare the performance of three types of neural network-based ensemble techniques to that of a single neural network. The ensemble algorithms are two versions of boosting and committees of neural networks trained independently. For each of the four algorithms, we experimentally determine the test and training error curves in an optical character recognition (OCR) problem as both a function of training set size and computational cost using three architectures. We show that a single machine is best for small training set size while for large training set size some version of boosting is best. However, for a given computational cost, boosting is always best. Furthermore, we show a surprising result for the original boosting algorithm: namely, that as the training set size increases, the training error decreases until it asymptotes to the test error rate. This has potential implications in the search for better training algorithms.

[1]  M. Kearns,et al.  Crytographic limitations on learning Boolean formulae and finite automata , 1989, STOC '89.

[2]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[3]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[4]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[5]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[6]  Eric B. Baum,et al.  Constructing Hidden Units Using Examples and Queries , 1990, NIPS.

[7]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[9]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[10]  Michael I. Jordan,et al.  Hierarchies of Adaptive Experts , 1991, NIPS.

[11]  Jon Atli Benediktsson,et al.  Consensus theoretic classification methods , 1992, IEEE Trans. Syst. Man Cybern..

[12]  Ching Y. Suen,et al.  Computer recognition of unconstrained handwritten numerals , 1992, Proc. IEEE.

[13]  Isabelle Guyon,et al.  Computer aided cleaning of large databases for character recognition , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[14]  E. Littmann Generalization Abilities of Cascade Network Architectures , 1992 .

[15]  Helge J. Ritter,et al.  Generalization Abilities of Cascade Network Architecture , 1992, NIPS.

[16]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[17]  Harris Drucker,et al.  Improving Performance in Neural Networks Using a Boosting Algorithm , 1992, NIPS.

[18]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[19]  Sargur N. Srihari,et al.  High-performance reading machines , 1992 .

[20]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[21]  Henry S. Baird,et al.  Document image defect models and their uses , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[22]  William W. Cohen Cryptographic Limitations on Learning One-Clause Logic Programs , 1993, AAAI.

[23]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.