论文信息 - Generalization in Decision Trees and DNF: Does Size Matter?

Generalization in Decision Trees and DNF: Does Size Matter?

Recent theoretical results for pattern classification with thresholded real-valued functions (such as support vector machines, sigmoid networks, and boosting) give bounds on misclassification probability that do not depend on the size of the classifier, and hence can be considerably smaller than the bounds that follow from the VC theory. In this paper, we show that these techniques can be more widely applied, by representing other boolean functions as two-layer neural networks (thresholded convex combinations of boolean functions). For example, we show that with high probability any decision tree of depth no more than d that is consistent with m training examples has misclassification probability no more than O((1/m(Neff VCdim(U) log2 m log d))1/2), where U is the class of node decision functions, and Neff ≤ N can be thought of as the effective number of leaves (it becomes small as the distribution on the leaves induced by the training data gets far from uniform). This bound is qualitatively different from the VC bound and can be considerably smaller. We use the same technique to give similar results for DNF formulae.

[1] Aiko M. Hormann,et al. Programs for Machine Learning. Part I , 1962, Inf. Control..

[2] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[3] David Haussler,et al. Learning decision trees from random examples , 1988, COLT '88.

[4] Usama M. Fayyad,et al. What Should Be Minimized in a Decision Tree? , 1990, AAAI.

[5] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[6] 金田重郎,et al. C4.5: Programs for Machine Learning (書評) , 1995 .

[7] Geoffrey I. Webb. Further Experimental Evidence against the Utility of Occam's Razor , 1996, J. Artif. Intell. Res..

[8] John Shawe-Taylor,et al. A framework for structural risk minimisation , 1996, COLT '96.

[9] Peter L. Bartlett,et al. For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.

[10] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[11] Russell Greiner,et al. Exploring the Decision Forest: An Empirical Investigation of Occam's Razor in Decision Tree Induction , 1997 .