Maximizing the Margin with Boosting

AdaBoost produces a linear combination of weak hypotheses. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current linear combination, i.e. by a hyperplane in feature space spanned by the weak hypotheses. The improvement is attributed to the experimental observation that the distances (margins) of the examples to the separating hyperplane are increasing even when the training error is already zero, that is all examples are on the correct side of the hyperplane. We give an iterative version of AdaBoost that explicitly maximizes the minimum margin of the examples. We bound the number of iterations and the number of hypotheses used in the final linear combination which approximates the maximum margin hyperplane with a certain precision. Our modified algorithm essentially retains the exponential convergence properties of AdaBoost and our result does not depend on the size of the hypothesis class.

[1]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[2]  HettichR.,et al.  Semi-infinite programming , 1979 .

[3]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[4]  S. Nash,et al.  Linear and Nonlinear Programming , 1987 .

[5]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Robert E. Schapire,et al.  Design and analysis of efficient learning algorithms , 1992, ACM Doctoral dissertation award ; 1991.

[8]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[9]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  Y. Freund Boosting a Weak Learning Algorithm by Majority to Be Published in Information and Computation , 1995 .

[12]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[13]  J. Ross Quinlan,et al.  Boosting First-Order Learning , 1996, ALT.

[14]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[15]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[16]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[17]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[18]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[19]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[20]  Olvi L. Mangasarian,et al.  Arbitrary-norm separating plane , 1999, Oper. Res. Lett..

[21]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[22]  Manfred K. Warmuth,et al.  Boosting as entropy projection , 1999, COLT '99.

[23]  John Shawe-Taylor,et al.  A Column Generation Algorithm For Boosting , 2000, ICML.

[24]  Dmitry Panchenko,et al.  Some New Bounds on the Generalization Error of Combined Classifiers , 2000, NIPS.

[25]  G. Rätsch Robust Boosting via Convex Optimization , 2001 .

[26]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[27]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[28]  Gunnar Rätsch,et al.  Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces , 2002, Machine Learning.