Totally corrective boosting algorithms that maximize the margin

We consider boosting algorithms that maintain a distribution over a set of examples. At each iteration a weak hypothesis is received and the distribution is updated. We motivate these updates as minimizing the relative entropy subject to linear constraints. For example AdaBoost constrains the edge of the last hypothesis w.r.t. the updated distribution to be at most γ = 0. In some sense, AdaBoost is "corrective" w.r.t. the last hypothesis. A cleaner boosting method is to be "totally corrective": the edges of all past hypotheses are constrained to be at most γ, where γ is suitably adapted.Using new techniques, we prove the same iteration bounds for the totally corrective algorithms as for their corrective versions. Moreover with adaptive γ, the algorithms provably maximizes the margin. Experimentally, the totally corrective versions return smaller convex combinations of weak hypotheses than the corrective ones and are competitive with LPBoost, a totally corrective boosting algorithm with no regularization, for which there is no iteration bound known.

[1]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[2]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[3]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[4]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[7]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[8]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[9]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[10]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[11]  J. Lafferty Additive models, boosting, and inference for generalized divergences , 1999, COLT '99.

[12]  Manfred K. Warmuth,et al.  Boosting as entropy projection , 1999, COLT '99.

[13]  David P. Helmbold,et al.  Potential Boosters? , 1999, NIPS.

[14]  John Shawe-Taylor,et al.  A Column Generation Algorithm For Boosting , 2000, ICML.

[15]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[16]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[17]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[18]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[19]  Philip M. Long,et al.  Mistake Bounds for Maximum Entropy Discrimination , 2004, NIPS.

[20]  Cynthia Rudin,et al.  The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins , 2004, J. Mach. Learn. Res..

[21]  Gunnar Rätsch,et al.  Efficient Margin Maximizing with Boosting , 2005, J. Mach. Learn. Res..