论文信息 - On the Existence of Linear Weak Learners and Applications to Boosting - 字舞流文

On the Existence of Linear Weak Learners and Applications to Boosting

We consider the existence of a linear weak learner for boosting algorithms. A weak learner for binary classification problems is required to achieve a weighted empirical error on the training set which is bounded from above by 1/2 − γ, γ > 0, for any distribution on the data set. Moreover, in order that the weak learner be useful in terms of generalization, γ must be sufficiently far from zero. While the existence of weak learners is essential to the success of boosting algorithms, a proof of their existence based on a geometric point of view has been hitherto lacking. In this work we show that under certain natural conditions on the data set, a linear classifier is indeed a weak learner. Our results can be directly applied to generalization error bounds for boosting, leading to closed-form bounds. We also provide a procedure for dynamically determining the number of boosting iterations required to achieve low generalization error. The bounds established in this work are based on the theory of geometric discrepancy.

Shie Mannor | Ron Meir | Shie Mannor | R. Meir

[1] I. J. Schoenberg. On Certain Metric Spaces Arising From Euclidean Spaces by a Change of Metric and Their Imbedding in Hilbert Space , 1937 .

[2] I. S. Gradshteyn,et al. Table of Integrals, Series, and Products , 1976 .

[3] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[4] R. Alexander. Generalized sums of distances , 1975 .

[5] L. Santaló. Integral geometry and geometric probability , 1976 .

[6] Franco P. Preparata,et al. The Densest Hemisphere Problem , 1978, Theor. Comput. Sci..

[7] D. F. Hays,et al. Table of Integrals, Series, and Products , 1966 .

[8] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1995, COLT '90.

[9] Ralph Alexander,et al. Geometric methods in the study of irregularities of distribution , 1990, Comb..

[10] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1990, COLT '90.

[11] R. Alexander. Principles of a new method in the study of irregularities of distribution , 1991 .

[12] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[13] R. Alexander. The effect of dimension on certain geometric problems of irregularities of distribution , 1994 .

[14] Jirí Matousek,et al. Tight upper bounds for the discrepancy of half-spaces , 1995, Discret. Comput. Geom..

[15] Yishay Mansour,et al. On the boosting ability of top-down decision tree learning algorithms , 1996, STOC '96.

[16] Rajeev Motwani,et al. Randomized algorithms , 1996, CSUR.

[17] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[18] Yoav Freund,et al. Game theory, on-line prediction and boosting , 1996, COLT '96.

[19] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[20] Mathukumalli Vidyasagar,et al. A Theory of Learning and Generalization , 1997 .

[21] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[22] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[23] Peter L. Bartlett,et al. Learning in Neural Networks: Theoretical Foundations , 1999 .

[24] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[25] Yishay Mansour,et al. On the Boosting Ability of Top-Down Decision Tree Learning Algorithms , 1999, J. Comput. Syst. Sci..

[26] Leo Breiman,et al. Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[27] Dmitry Panchenko,et al. Some New Bounds on the Generalization Error of Combined Classifiers , 2000, NIPS.

[28] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[29] Ran El-Yaniv,et al. Localized Boosting , 2000, COLT.

[30] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[31] Shie Mannor,et al. Weak Learners and Improved Rates of Convergence in Boosting , 2000, NIPS.

[32] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[33] Peter L. Bartlett,et al. Functional Gradient Techniques for Combining Hypotheses , 2000 .

[34] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[35] R. Schapire. The Strength of Weak Learnability , 1990, Machine Learning.

[36] V. Vapnik. Estimation of Dependences Based on Empirical Data , 2006 .