The Boosting Approach to Machine Learning An Overview

Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, this chapter overviews some of the recent work on boosting including analyses of AdaBoost’s training error and generalization error; boosting’s connection to game theory and linear programming; the relationship between boosting and logistic regression; extensions of AdaBoost for multiclass classification problems; methods of incorporating human knowledge into boosting; and experimental and applied work using boosting.

[1]  Journal of the Association for Computing Machinery , 1961, Nature.

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[4]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[5]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[6]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[7]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[8]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[9]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[12]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[13]  Leslie G. Valiant,et al.  Cryptographic limitations on learning Boolean formulae and finite automata , 1994, JACM.

[14]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[15]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[16]  Mark Craven,et al.  Learning Sparse Perceptrons , 1995, NIPS.

[17]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[18]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[19]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[20]  Corinna Cortes,et al.  Boosting Decision Trees , 1995, NIPS.

[21]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[22]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[23]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[24]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[25]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[26]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[27]  Yoshua Bengio,et al.  Training Methods for Adaptive Boosting of Neural Networks , 1997, NIPS.

[28]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[29]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[31]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[32]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[33]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[35]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[36]  Peter L. Bartlett,et al.  Direct Optimization of Margins Improves Generalization in Combined Classifiers , 1998, NIPS.

[37]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[38]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[39]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[40]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[41]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[42]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[43]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[44]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[45]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[46]  Yoram Singer,et al.  Boosting Applied to Tagging and PP Attachment , 1999, EMNLP.

[47]  Yoram Singer,et al.  A simple, fast, and effective rule learner , 1999, AAAI 1999.

[48]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[49]  J. Lafferty Additive models, boosting, and inference for generalized divergences , 1999, COLT '99.

[50]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT '99.

[51]  Manfred K. Warmuth,et al.  Boosting as entropy projection , 1999, COLT '99.

[52]  David P. Helmbold,et al.  Potential Boosters? , 1999, NIPS.

[53]  Thomas Richardson,et al.  Boosting methodology for regression problems , 1999, AISTATS.

[54]  Gunnar Rätsch,et al.  Barrier Boosting , 2000, COLT.

[55]  Dmitry Panchenko,et al.  Some New Bounds on the Generalization Error of Combined Classifiers , 2000, NIPS.

[56]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[57]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[58]  Lluís Màrquez i Villodre,et al.  Boosting Applied to Word Sense Disambiguation , 2000, ArXiv.

[59]  Yoram Singer,et al.  Boosting for document routing , 2000, CIKM '00.

[60]  Eric Johnson,et al.  Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry , 2000, IEEE Trans. Neural Networks Learn. Syst..

[61]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[62]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[63]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[64]  Dmitry Panchenko,et al.  Further Explanation of the Effectiveness of Voting Methods: The Game between Margins and Weights , 2001, COLT/EuroCOLT.

[65]  John D. Lafferty,et al.  Boosting and Maximum Likelihood for Exponential Models , 2001, NIPS.

[66]  Bhiksha Raj,et al.  A boosting approach for confidence scoring , 2001, INTERSPEECH.

[67]  Cesare Furlanello,et al.  Tuning Cost-Sensitive Boosting and Its Application to Melanoma Diagnosis , 2001, Multiple Classifier Systems.

[68]  Marilyn A. Walker,et al.  SPoT: A Trainable Sentence Planner , 2001, NAACL.

[69]  Srinivas Bangalore,et al.  Combining prior knowledge and boosting for call classification in spoken language dialogue , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[70]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[71]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[72]  Robert E. Schapire,et al.  Drifting Games , 1999, COLT '99.

[73]  Satoshi Shirai,et al.  Using Decision Trees to Construct a Practical Parser , 1999, COLING.

[74]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[75]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[76]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[77]  Paul A. Viola,et al.  Boosting Image Retrieval , 2004, International Journal of Computer Vision.

[78]  David P. Helmbold,et al.  Boosting Methods for Regression , 2002, Machine Learning.

[79]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[80]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[81]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[82]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[83]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.