On the Existence of Linear Weak Learners and Applications to Boosting

We consider the existence of a linear weak learner for boosting algorithms. A weak learner for binary classification problems is required to achieve a weighted empirical error on the training set which is bounded from above by 1/2 − γ, γ > 0, for any distribution on the data set. Moreover, in order that the weak learner be useful in terms of generalization, γ must be sufficiently far from zero. While the existence of weak learners is essential to the success of boosting algorithms, a proof of their existence based on a geometric point of view has been hitherto lacking. In this work we show that under certain natural conditions on the data set, a linear classifier is indeed a weak learner. Our results can be directly applied to generalization error bounds for boosting, leading to closed-form bounds. We also provide a procedure for dynamically determining the number of boosting iterations required to achieve low generalization error. The bounds established in this work are based on the theory of geometric discrepancy.

[1]  I. J. Schoenberg On Certain Metric Spaces Arising From Euclidean Spaces by a Change of Metric and Their Imbedding in Hilbert Space , 1937 .

[2]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[3]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[4]  R. Alexander Generalized sums of distances , 1975 .

[5]  L. Santaló Integral geometry and geometric probability , 1976 .

[6]  Franco P. Preparata,et al.  The Densest Hemisphere Problem , 1978, Theor. Comput. Sci..

[7]  D. F. Hays,et al.  Table of Integrals, Series, and Products , 1966 .

[8]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[9]  Ralph Alexander,et al.  Geometric methods in the study of irregularities of distribution , 1990, Comb..

[10]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[11]  R. Alexander Principles of a new method in the study of irregularities of distribution , 1991 .

[12]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[13]  R. Alexander The effect of dimension on certain geometric problems of irregularities of distribution , 1994 .

[14]  Jirí Matousek,et al.  Tight upper bounds for the discrepancy of half-spaces , 1995, Discret. Comput. Geom..

[15]  Yishay Mansour,et al.  On the boosting ability of top-down decision tree learning algorithms , 1996, STOC '96.

[16]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[17]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[18]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[19]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[20]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization , 1997 .

[21]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[22]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[23]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[24]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[25]  Yishay Mansour,et al.  On the Boosting Ability of Top-Down Decision Tree Learning Algorithms , 1999, J. Comput. Syst. Sci..

[26]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[27]  Dmitry Panchenko,et al.  Some New Bounds on the Generalization Error of Combined Classifiers , 2000, NIPS.

[28]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[29]  Ran El-Yaniv,et al.  Localized Boosting , 2000, COLT.

[30]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[31]  Shie Mannor,et al.  Weak Learners and Improved Rates of Convergence in Boosting , 2000, NIPS.

[32]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[33]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[34]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[35]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[36]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .