Learning classification trees

Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to Quinlan's information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach,c4 (Quinlanet al., 1987) andcart (Breimanet al., 1984), show that the full Bayesian algorithm can produce more accurate predictions than versions of these other approaches, though pays a computational price.

[1]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[2]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  Ivan Bratko,et al.  ASSISTANT 86: A Knowledge-Elicitation Tool for Sophisticated Users , 1987, EWSL.

[5]  Leland Stewart,et al.  Hierarchical Bayesian Analysis using Monte Carlo Integration: Computing Posterior Distributions when , 1987 .

[6]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[7]  Chris Carter,et al.  Assessing Credit Card Applications Using Machine Learning , 1987, IEEE Expert.

[8]  Paul Compton,et al.  Inductive knowledge acquisition: a case study , 1987 .

[9]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[10]  Sholom M. Weiss,et al.  Optimizing the Predictive Value of Diagnostic Decision Rules , 1987, AAAI.

[11]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[12]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[13]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.

[14]  T. L. McCluskey Progress in machine learning - Proceedings of EWSL 87: Second European working session on learning by I. Bratko and N. Lavrac (eds.), Sigma Press, pp 256, £14.95 , 1989, Knowl. Eng. Rev..

[15]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[16]  Stuart L. Crawford Extensions to the CART Algorithm , 1989, Int. J. Man Mach. Stud..

[17]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[18]  L. Joseph,et al.  Bayesian Statistics: An Introduction , 1989 .

[19]  C. C. Rodriguez Objective Bayesianism and Geometry , 1990 .

[20]  Michael J. Grimble,et al.  Knowledge-based systems for industrial control , 1990 .

[21]  Donald Michie,et al.  Cognitive models from subcognitive skills , 1990 .

[22]  Sholom M. Weiss,et al.  Maximizing the Predictive Value of Production Rules , 1990, Artif. Intell..

[23]  Wray L. Buntine,et al.  A theory of learning classification rules , 1990 .

[24]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Wray L. Buntine,et al.  Introduction in IND and recursive partitioning , 1991 .

[26]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..