论文信息 - Benchmarking Open-Source Tree Learners in R/RWeka

Benchmarking Open-Source Tree Learners in R/RWeka

The two most popular classification tree algorithms in machine learning and statistics — C4.5 and CART — are compared in a benchmark experiment together with two other more recent constant-fit tree learners from the statistics literature (QUEST, conditional inference trees). The study assesses both misclassification error and model complexity on bootstrap replications of 18 different benchmark datasets. It is carried out in the R system for statistical computing, made possible by means of the RWeka package which interfaces R to the opensource machine learning toolbox Weka. Both algorithms are found to be competitive in terms of misclassification error—with the performance difference clearly varying across data sets. However, C4.5 tends to grow larger and thus more complex trees.

Kurt Hornik | Achim Zeileis | David Meyer | Michael Schauerhuber

[1] Aiko M. Hormann,et al. Programs for Machine Learning. Part I , 1962, Inf. Control..

[2] Kurt Hornik,et al. The Design and Analysis of Benchmark Experiments , 2005 .

[3] W. Loh,et al. SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[4] Ian Witten,et al. Data Mining , 2000 .

[5] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[6] K. Hornik,et al. Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[7] Kurt Hornik,et al. Deriving Consensus Rankings from Benchmarking Experiments , 2006, GfKl.

[8] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[9] Kurt Hornik,et al. The support vector machine under test , 2003, Neurocomputing.

[10] Leo Breiman,et al. Classification and Regression Trees , 1984 .