Formation of clusters and resolution of ordinal attributes in ID3 classification trees

Many learning systems have been designed to construct classification trees from a set of training examples. One of the most widely used approaches for constructing decision trees is the ID3 algorithm [Quinlan 1986]. Decision trees are ill-suited to handle attributes with ordinal values. Problems arise when a node representing an ordinal attribute has a branch for each value of the ordinal attribute in the training set. This is generally infeasible when the set of ordinal values is very large. Past approaches have sought to cluster large sets of ordinal values before the classification tree is constructed [Quinlan 1986; Lebowitz 1985; Michslski and Stepp 1983]. The approach presented in this paper dynamically forms sinlge-conclusion and multiple-conclusion (overlap) clusters at leaves in the decision trees. The experiments with the Lebowitz gap-finding method resulted in clusters which differentiated classes very poorly. Domain knowledge is used directly by the overlap resolution algorithms to &termine the number of clusters to form. The multiple-conclusion clusters are then resolved by using techniques [Hacke 1990]. Results indicate that overlap resolution techniques capture the ability to predict the classification of the unseen instances. The results of accuracy increase and learning-rate experiments are encouraging. Experiments are performed to determine the techniques to which maximum accuracy-increase is attributable. The numerical results of the experiments are presented in this paper.