Improved Boosting Algorithms Using Confidence-rated Predictions

We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a simplified analysis of AdaBoost in this setting, and we show how this analysis can be used to find improved parameter settings as well as a refined criterion for training weak hypotheses. We give a specific method for assigning confidences to the predictions of decision trees, a method closely related to one used by Quinlan. This method also suggests a technique for growing decision trees which turns out to be identical to one proposed by Kearns and Mansour. We focus next on how to apply the new boosting algorithms to multiclass classification problems, particularly to the multi-label case in which each example may belong to more than one class. We give two boosting methods for this problem, plus a third method based on output coding. One of these leads to a new method for handling the single-label case which is simpler but as effective as techniques suggested by Freund and Schapire. Finally, we give some experimental results comparing a few of the algorithms discussed in this paper.

[1]  R. Fletcher Practical Methods of Optimization , 1988 .

[2]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[3]  Thomas G. Dietterich,et al.  In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[4]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[5]  A. Jefferson Offutt,et al.  An Empirical Evaluation , 1994 .

[6]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain , 1995, ICML.

[7]  Philip M. Long,et al.  A Generalization of Sauer's Lemma , 1995, J. Comb. Theory, Ser. A.

[8]  Corinna Cortes,et al.  Boosting Decision Trees , 1995, NIPS.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[13]  Yoshua Bengio,et al.  Training Methods for Adaptive Boosting of Neural Networks , 1997, NIPS.

[14]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[15]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[16]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[17]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[18]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[19]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[20]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[21]  Yoram Singer,et al.  BoosTexter: A System for Multiclass Multi-label Text Categorization , 1998 .

[22]  L. Breiman Arcing Classifiers , 1998 .

[23]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[24]  Yishay Mansour,et al.  On the Boosting Ability of Top-Down Decision Tree Learning Algorithms , 1999, J. Comput. Syst. Sci..

[25]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .