A Statistical Approach to Decision Tree Modeling

A statistical approach to decision tree modeling is described. In this approach, each decision in the tree is modeled parametrically as is the process by which an output is generated from an input and a sequence of decisions. The resulting model yields a likelihood measure of goodness of fit, allowing ML and MAP estimation techniques to be utilized. An efficient algorithm is presented to estimate the parameters in the tree. The model selection problem is presented and several alternative proposals are considered. A hidden Markov version of the tree is described for data sequences that have temporal dependencies.