On growing better decision trees from data

This thesis investigates the problem of growing decision trees from data, for the purposes of classification and prediction. After a comprehensive, multi-disciplinary survey of work on decision trees, some algorithmic extensions to existing tree growing methods are considered. The implications of using (1) less greedy search and (2) less restricted splits at tree nodes are systematically studied. Extending the traditional axis-parallel splits to oblique splits is shown to be practical and beneficial for a variety of problems. However, the use of more extensive search heuristics than the traditional greedy heuristic is argued to be unnecessary, and often harmful. Any effort to build good decision trees from real-world data involves "massaging" the data into a suitable form. Two forms of data massaging, domain-independent and domain-specific, are distinguished in this work. A new framework is outlined for the former, and the importance of the latter is illustrated in the context of two new, complex classification problems in astronomy. Highly accurate and small decision tree classifiers are built for both these problems through a collaborative effort with astronomers.

[1]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[2]  Michael Lebowitz Categorizing numeric information for generalization , 1985 .

[3]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[4]  A S Houston,et al.  Evaluation of the use of induction in the development of a medical expert system. , 1994, Computers and biomedical research, an international journal.

[5]  Leland Stewart,et al.  Hierarchical Bayesian Analysis using Monte Carlo Integration: Computing Posterior Distributions when , 1987 .

[6]  Jack Sklansky,et al.  Training a One-Dimensional Classifier to Minimize the Probability of Error , 1972, IEEE Trans. Syst. Man Cybern..

[7]  Robert A. Pearson,et al.  Vector Evaluation in Induction Algorithms , 1990, Int. J. High Speed Comput..

[8]  Neil A. B. Gray,et al.  Capturing knowledge through top-down induction of decision trees , 1990, IEEE Expert.

[9]  Jack Sklansky,et al.  Feature Selection for Automatic Classification of Non-Gaussian Data , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Gary J. Koehler,et al.  An investigation on the conditions of pruning an induced decision tree , 1994 .

[11]  R Luthringer,et al.  Statistical Decision Tree: a tool for studying pharmaco-EEG effects of CNS-active drugs. , 1994, Neuropsychobiology.

[12]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  Monica Chiogna,et al.  Expert derived automatically generated classification trees: an example from pediatric cardiology , 1993, Proceedings of Computers in Cardiology Conference.

[14]  C. Y. Lee Representation of switching circuits by binary-decision programs , 1959 .

[15]  Bowser-Chao,et al.  Comparison of the use of binary decision trees and neural networks in top-quark detection. , 1993, Physical review. D, Particles and fields.

[16]  Edward J. Delp,et al.  An Iterative Growing and Pruning Algorithm for Classification Tree Design , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  James M. Keller,et al.  Uncertainty management for rule-based systems with applications to image analysis , 1994, IEEE Trans. Syst. Man Cybern..

[18]  Richard P. Brent,et al.  Fast training algorithms for multilayer neural nets , 1991, IEEE Trans. Neural Networks.

[19]  Wray L. Buntine,et al.  Introduction in IND and recursive partitioning , 1991 .

[20]  Yves Kodratoff,et al.  Machine Learning for Object Recognition and Scene Analysis , 1994, Int. J. Pattern Recognit. Artif. Intell..

[21]  Sheldon B. Akers,et al.  Binary Decision Diagrams , 1978, IEEE Transactions on Computers.

[22]  Sung-Ho Kim,et al.  A general property among nested, pruned subtrees of a decision-support tree , 1994 .

[23]  Jason Catlett,et al.  Experiments on the Costs and Benefits of Windowing in ID3 , 1988, ML.

[24]  David E. Boyce,et al.  Optimal Subset Selection , 1974 .

[25]  Robert J. McQueen,et al.  Applying machine learning to agricultural data , 1995 .

[26]  Dana S. Nau Decision Quality As a Function of Search Depth on Game Trees , 1983, JACM.

[27]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .

[28]  Subhash C. NarulaI,et al.  The Minimum Sum of Absolute Errors Regression: A State of the Art Survey , 1982 .

[29]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[30]  Younkyung Cha Kang Randomized Algorithms for Query Optimization , 1991 .

[31]  J. R. Quinlan Probabilistic decision trees , 1990 .

[32]  Kristin P. Bennett Machine learning via mathematical programming , 1993 .

[33]  Richard S. Forsyth,et al.  Overfitting revisited: an information-theoretic approach to simplifying discrimination trees , 1994, J. Exp. Theor. Artif. Intell..

[34]  Donald H. Foley Considerations of sample and feature size , 1972, IEEE Trans. Inf. Theory.

[35]  Oren Etzioni,et al.  Representation design and brute-force induction in a Boeing manufacturing domain , 1994, Appl. Artif. Intell..

[36]  David Mutchler The Multi-Player Version of Minimax Displays Game-Tree Pathology , 1993, Artif. Intell..

[37]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[38]  Anthony N. Mucciardi,et al.  A Comparison of Seven Techniques for Choosing Subsets of Pattern Recognition Properties , 1971, IEEE Transactions on Computers.

[39]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[40]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[41]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[42]  Bernard M. E. Moret,et al.  The Activity of a Variable and Its Relation to Decision Trees , 1980, TOPL.

[43]  Janice D. Callahan,et al.  Rule Induction for Group Decisions with Statistical Data — An Example , 1991 .

[44]  A. Famili,et al.  Use of decision-tree induction for process optimization and knowledge refinement of an industrial process , 1994, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[45]  Sartaj Sahni,et al.  Approximate Algorithms for the 0/1 Knapsack Problem , 1975, JACM.

[46]  Toby Walsh,et al.  An Empirical Analysis of Search in GSAT , 1993, J. Artif. Intell. Res..

[47]  Ron Rymon,et al.  Automatic cataloguing and characterization of earth science data using set enumeration trees , 1994 .

[48]  Olvi L. Mangasarian,et al.  Mathematical Programming in Neural Networks , 1993, INFORMS J. Comput..

[49]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[50]  PracticeLutz Prechelt,et al.  A Study of Experimental Evaluationsof Neural Network Learning Algorithms : Current Research , 1994 .

[51]  Nicolino J. Pizzi,et al.  Comparative review of knowledge engineering and inductive learning using data in a medical domain , 1990, Defense, Security, and Sensing.

[52]  S. Sitharama Iyengar,et al.  Efficient algorithms to globally balance a binary search tree , 1984, CACM.

[53]  Ronald L. Graham,et al.  On the History of the Minimum Spanning Tree Problem , 1985, Annals of the History of Computing.

[54]  K. S. Fu,et al.  An Approach to the Design of a Linear Binary Tree Classifier , 1976 .

[55]  Tim Niblett,et al.  Constructing Decision Trees in Noisy Domains , 1987, EWSL.

[56]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[57]  M. Kurzynski The optimal strategy of a tree classifier , 1983 .

[58]  E. Stanford EXPERT SYSTEMS IN THE 1980 s , 2022 .

[59]  Padhraic Smyth,et al.  Decision tree design using information theory , 1990, Knowledge Acquisition.

[60]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[61]  Donald E. Brown,et al.  A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems , 1992, Pattern Recognit..

[62]  Paul E. Utgoff,et al.  Perceptron Trees : A Case Study in ybrid Concept epresentations , 1999 .

[63]  William A. Belson,et al.  Matching and Prediction on the Principle of Biological Classification , 1959 .

[64]  John R. Koza,et al.  Concept Formation and Decision Tree Induction Using the Genetic Programming Paradigm , 1990, PPSN.

[65]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[66]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[67]  Hugh B. Woodruff,et al.  An algorithm for a selective nearest neighbor decision rule (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[68]  Carla E. Brodley Recursive automatic algorithm selection for inductive learning , 1995 .

[69]  Donald Michie,et al.  The superarticulacy phenomenon in the context of software manufacture , 1990 .

[70]  O. Mangasarian,et al.  Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[71]  R. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[72]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[73]  D. A. Preece,et al.  Identification Keys and Diagnostic Tables: a Review , 1980 .

[74]  J. Ross Quinlan,et al.  An Empirical Comparison of Genetic and Decision-Tree Classifiers , 1988, ML.

[75]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[76]  Padhraic Smyth,et al.  Decision tree design from a communication theory standpoint , 1988, IEEE Trans. Inf. Theory.

[77]  B. Silverman,et al.  Block diagrams and splitting criteria for classification trees , 1993 .

[78]  S. Djorgovski,et al.  Automated Star/Galaxy Classification for Digitized Poss-II , 1995 .

[79]  Jean-Pierre Nadal,et al.  Neural trees: a new tool for classification , 1990 .

[80]  I. Burhan Türksen,et al.  An equivalence between inductive learning and pseudo-Boolean logic simplification: a rule generation and reduction scheme , 1993, IEEE Trans. Syst. Man Cybern..

[81]  James A. Storer,et al.  Design and Performance of Tree-Structured Vector Quantizers , 1994, Inf. Process. Manag..

[82]  M. Milosavljevic,et al.  On the influence of the training set data preprocessing on neural networks training , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[83]  I. Bratko,et al.  Information-based evaluation criterion for classifier's performance , 2004, Machine Learning.

[84]  S. J. Park,et al.  PRTSM: Pattern recognition-based time series modeler , 1989 .

[85]  Thierry Van de Merckt NFDT: A System that Learns Flexible Concepts Based on Decision Trees for Numerical Attributes , 1992, ML.

[86]  L. A. Cox,et al.  Heuristic least-cost computation of discrete classification functions with uncertain argument values , 1990 .

[87]  J. Sklansky,et al.  Automated design of multiple-class piecewise linear classifiers , 1989 .

[88]  Ellis Horowitz,et al.  Fundamentals of Computer Algorithms , 1978 .

[89]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[90]  S. Odewahn,et al.  Automated star/galaxy discrimination with neural networks , 1992 .

[91]  Marek Kurzynski On the identity of optimal strategies for multistage classifiers , 1989, Pattern Recognit. Lett..

[92]  J. Sklansky,et al.  Pattern classifiers and trainable machines : with 117 illustrations , 1981 .

[93]  Nimrod Megiddo,et al.  On the complexity of polyhedral separability , 1988, Discret. Comput. Geom..

[94]  John Mingers,et al.  Neural Networks, Decision Tree Induction and Discriminant Analysis: an Empirical Comparison , 1994 .

[95]  Marek Kurzynski,et al.  On the multistage Bayes classifier , 1988, Pattern Recognit..

[96]  Stuart L. Crawford Extensions to the CART Algorithm , 1989, Int. J. Man Mach. Stud..

[97]  J. Carbonell,et al.  Technical Note A Distance-Based Attribute for Decision Tree Induction , 1991 .

[98]  Belur V. Dasarathy,et al.  Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design , 1994, IEEE Trans. Syst. Man Cybern..

[99]  Gert Cauwenberghs,et al.  A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization , 1992, NIPS.

[100]  R. Bucy,et al.  Decision tree design by simulated annealing , 1993 .

[101]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[102]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[103]  Ren Wang Qing Decision tree approach to pattern recognition problems in a large character set , 1984 .

[104]  Ren C. Luo,et al.  Object identification using automated decision tree construction approach for robotics applications , 1987, J. Field Robotics.

[105]  Florence d'Alché-Buc,et al.  Trio Learning: A New Strategy for Building Hybrid Neural Trees , 1994, Int. J. Neural Syst..

[106]  Eve A. Riskin,et al.  Lookahead in growing tree-structured vector quantizers , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[107]  R.M. Gray,et al.  A greedy tree growing algorithm for the design of variable rate vector quantizers [image compression] , 1991, IEEE Trans. Signal Process..

[108]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[109]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.

[110]  Kristin P. Bennett,et al.  Decision Tree Construction Via Linear Programming , 1992 .

[111]  P.-L. Tu,et al.  A new decision-tree classification algorithm for machine learning , 1992, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI '92.

[112]  M F Collen,et al.  Towards automated medical decisions. , 1972, Computers and biomedical research, an international journal.

[113]  Masahiro Miyakawa Criteria for Selecting a Variable in the Construction of Efficient Decision Trees , 1989, IEEE Trans. Computers.

[114]  Pramod K. Varshney,et al.  Application of Information Theory to Sequential Fault Diagnosis , 1982, IEEE Transactions on Computers.

[115]  Richard H. Roth An Approach to Solving Linear Discrete Optimization Problems , 1970, JACM.

[116]  Michael I. Jordan,et al.  Learning in Boltzmann Trees , 1994, Neural Computation.

[117]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[118]  Seth Zimmerman An Optimal Search Procedure , 1959 .

[119]  P. P. Chakrabarti,et al.  Improving Greedy Algorithms by Lookahead-Search , 1994, J. Algorithms.

[120]  K. Pattipati,et al.  Application of heuristic search and information theory to sequential fault diagnosis , 1988, Proceedings IEEE International Symposium on Intelligent Control 1988.

[121]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[122]  I. K. Sethi,et al.  Hierarchical Classifier Design Using Mutual Information , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[123]  Douglas Comer,et al.  The Complexity of Trie Index Construction , 1977, JACM.

[124]  Laveen N. Kanal,et al.  Pattern Recognition in Practice V , 1997, Pattern Recognit. Lett..

[125]  Alan J. Miller Subset Selection in Regression , 1992 .

[126]  Ching Y. Suen,et al.  ISOETRP - an interactive clustering algorithm with new objectives , 1984, Pattern Recognit..

[127]  J. Morgan,et al.  Thaid a Sequential Analysis Program for the Analysis of Nominal Scale Dependent Variables , 1973 .

[128]  Thomas G. Dietterich,et al.  A Comparative Review of Selected Methods for Learning from Examples , 1983 .

[129]  Ayumi Shinohara,et al.  Knowledge Acquisition from Amino Acid Sequences by Machine Learning System BONSAI , 1992 .

[130]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[131]  Jeffrey Scott Vitter,et al.  Nearly optimal vector quantization via linear programming , 1992, Data Compression Conference, 1992..

[132]  Luc Devroye,et al.  Automatic Pattern Recognition: A Study of the Probability of Error , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[133]  Sanchoy K. Das,et al.  A decision tree approach for selecting between demand based, reorder, and JIT/kanban methods for material procurement , 1994 .

[134]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[135]  M J English,et al.  Accurate segmentation of respiration waveforms from infants enabling identification and classification of irregular breathing patterns. , 1994, Medical engineering & physics.

[136]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[137]  Seong-Whan Lee,et al.  Noisy Hangul character recognition with fuzzy tree classifier , 1992, Electronic Imaging.

[138]  J. R. Quinlan,et al.  Comparing connectionist and symbolic learning methods , 1994, COLT 1994.

[139]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[140]  David A. Landgrebe,et al.  The decision tree approach to classification , 1975 .

[141]  Richard L. White,et al.  DECISION TREES FOR AUTOMATED IDENTIFICATION OF COSMIC-RAY HITS IN HUBBLE SPACE TELESCOPE IMAGES , 1995 .

[142]  James A. Storer,et al.  Optimal Pruning for Tree-Structured Vector Quantization , 1992, Inf. Process. Manag..

[143]  Fritz Wysotzki,et al.  Automatic construction of decision trees for classification , 1994, Ann. Oper. Res..

[144]  N. Draper,et al.  Applied Regression Analysis. , 1967 .

[145]  A S Houston,et al.  The use of induction in the design of an expert system for thyroid function studies , 1991, Nuclear medicine communications.

[146]  David L. Verbyla,et al.  Classification and Regression Tree Analysis for Assessing Hazard of Pine Mortality Caused by Heterobasidion annosum , 1993 .

[147]  Ashok V. Kulkarni On the Mean Accuracy of Hierarchical Classifiers , 1978, IEEE Transactions on Computers.

[148]  Moshe Ben-Bassat,et al.  Myopic Policies in Sequential Classification , 1978, IEEE Transactions on Computers.

[149]  Ching Y. Suen,et al.  Large Tree Classifier with Heuristic Search and Global Training , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[150]  King-Sun Fu,et al.  A Nonparametric Partitioning Procedure for Pattern Classification , 1969, IEEE Transactions on Computers.

[151]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[152]  Wayne Ieee,et al.  Entropy Nets: From Decision Trees to Neural Networks , 1990 .

[153]  Michael J. Shaw,et al.  Learning-based scheduling in a flexible manufacturing flow line , 1994 .

[154]  Philip M. Lewis,et al.  The characteristic selection problem in recognition systems , 1962, IRE Trans. Inf. Theory.

[155]  Thomas R. Hancock Learning kμ decision trees on the uniform distribution , 1993, COLT '93.

[156]  Jerome H. Friedman,et al.  Flexible Metric Nearest Neighbor Classification , 1994 .

[157]  MANABU ICHINO,et al.  Optimum feature selection by zero-one integer programming , 1984, IEEE Transactions on Systems, Man, and Cybernetics.

[158]  Gabor T. Herman,et al.  On Piecewise-Linear Classification , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[159]  J. Ross Quinlan,et al.  Decision trees and decision-making , 1990, IEEE Trans. Syst. Man Cybern..

[160]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[161]  Ming Tan,et al.  Cost-sensitive learning of classification knowledge and its applications in robotics , 2004, Machine Learning.

[162]  Judea Pearl,et al.  ON THE CONNECTION BETWEEN THE COMPLEXITY AND CREDIBILITY OF INFERRED MODELS , 1978 .

[163]  Leo Breiman,et al.  [The ∏ Method for Estimating Multivariate Functions from Noisy Data]: Response , 1991 .

[164]  Tharam S. Dillon,et al.  A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[165]  S. Djorgovski,et al.  The discovery of five quasars at z>4 using the Second Palomar Sky Survey , 1995 .

[166]  S. Djorgovski,et al.  Initial Galaxy Counts from Digitized Poss-II , 1995 .

[167]  William A. Wallace,et al.  Induction of Rules Subject to a Quality Constraint: Probabilistic Inductive Learning , 1993, IEEE Trans. Knowl. Data Eng..

[168]  Behrokh Khoshnevis,et al.  Machine Learning and Simulation: Application in Queuing Systems , 1993, Simul..

[169]  Derek L. Nazareth,et al.  Investigating the effectiveness of conditional classification: an application to manufacturing scheduling , 1994 .

[170]  M. Shaw,et al.  Induction of fuzzy decision trees , 1995 .

[171]  William M. Spears,et al.  An Artificial Intelligence Approach to Analog Systems Diagnosis , 1991 .

[172]  Michel Manago,et al.  Generalization and Noise , 1987, Int. J. Man Mach. Stud..

[173]  B. Chandrasekaran,et al.  Quantization Complexity and Independent Measurements , 1974, IEEE Transactions on Computers.

[174]  Mihalis Yannakakis,et al.  How easy is local search? , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[175]  Jan Karel Lenstra,et al.  A Computational Study of Local Search Algorithms for Job Shop Scheduling , 1994, INFORMS J. Comput..

[176]  Chris Carter,et al.  Assessing Credit Card Applications Using Machine Learning , 1987, IEEE Expert.

[177]  O. Mangasarian,et al.  Multicategory discrimination via linear programming , 1994 .

[178]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[179]  Ishwar K. Sethi,et al.  Efficient decision tree design for discrete variable pattern recognition problems , 1977, Pattern Recognition.

[180]  Gerhard W. Dueck,et al.  Threshold accepting: a general purpose optimization algorithm appearing superior to simulated anneal , 1990 .

[181]  Gary J. Koehler,et al.  PAC-learning a decision tree with pruning , 1996 .

[182]  Y In A NEW INDUCTIVE LEARNING ALGORITHM——SEPARABILITY-BASED INDUCTIVE LEARNING ALGORITHM , 1993 .

[183]  P. Utgoff,et al.  Multivariate Versus Univariate Decision Trees , 1992 .

[184]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[185]  Saul B. Gelfand,et al.  Classification trees with neural network feature extraction , 1992, IEEE Trans. Neural Networks.

[186]  Jean-Loup Baer,et al.  A comparison of tree-balancing algorithms , 1977, CACM.

[187]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[188]  Leo Breiman,et al.  Hinging hyperplanes for regression, classification, and function approximation , 1993, IEEE Trans. Inf. Theory.

[189]  Russell Greiner,et al.  Exploring the Decision Forest: An Empirical Investigation of Occam's Razor in Decision Tree Induction , 1997 .

[190]  Allan P. White,et al.  The importance of attribute selection measures in decision tree induction , 2005, Machine Learning.

[191]  William S. Meisel,et al.  A Partitioning Algorithm with Application in Pattern Classification and the Optimization of Decision Trees , 1973, IEEE Transactions on Computers.

[192]  Carla E. Brodley,et al.  Linear Machine Decision Trees , 1991 .

[193]  O. J. Murphy,et al.  Designing Storage Efficient Decision Trees , 1991, IEEE Trans. Computers.

[194]  R. Gray,et al.  Applications of information theory to pattern recognition and the design of decision trees and trellises , 1988 .

[195]  David Haussler,et al.  Learning decision trees from random examples , 1988, COLT '88.

[196]  Godfried T. Toussaint,et al.  Bibliography on estimation of misclassification , 1974, IEEE Trans. Inf. Theory.

[197]  Jan L. Talmon A multiclass nonparametric partitioning algorithm , 1986, Pattern Recognit. Lett..

[198]  Wendy G. Lehnert,et al.  Inductive text classification for medical applications , 1995, J. Exp. Theor. Artif. Intell..

[199]  Marcus R. Frean,et al.  Small nets and short paths : optimising neural computation , 1990 .

[200]  Cris Koutsougeras,et al.  A Hybrid Electro-Optical Architecture for Classification Trees and Associative Memory Mechanisms , 1993, Int. J. Artif. Intell. Tools.

[201]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[202]  Kenneth L. McMillan,et al.  Symbolic model checking: an approach to the state explosion problem , 1992 .

[203]  Shailendra C. Palvia,et al.  Tables, trees and formulas in decision analysis , 1992, CACM.

[204]  Seymour Shlien,et al.  Multiple binary decision tree classifiers , 1990, Pattern Recognit..

[205]  Saul B. Gelfand,et al.  A tree-structured piecewise linear adaptive filter , 1993, IEEE Trans. Inf. Theory.

[206]  C. L. Pittard,et al.  Classification trees with optimal multi-variate splits , 1993, Proceedings of IEEE Systems Man and Cybernetics Conference - SMC.

[207]  Laveen N. Kanal,et al.  Patterns in pattern recognition: 1968-1974 , 1974, IEEE Trans. Inf. Theory.

[208]  Rajiv Gupta,et al.  On randomization in sequential and distributed algorithms , 1994, CSUR.

[209]  Jack Sklansky,et al.  Locally Trained Piecewise Linear Classifiers , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[210]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[211]  Carla E. Brodley,et al.  An Incremental Method for Finding Multivariate Splits for Decision Trees , 1990, ML.

[212]  Ivan Bratko,et al.  ASSISTANT 86: A Knowledge-Elicitation Tool for Sophisticated Users , 1987, EWSL.

[213]  Seymour Shlien Nonparametric classification using matched binary decision trees , 1992, Pattern Recognit. Lett..

[214]  G. Kalkanis,et al.  The application of confidence interval error analysis to the design of decision tree classifiers , 1993, Pattern Recognit. Lett..

[215]  Michael J. Kurtz Astronomical object classification , 1988 .

[216]  Dean Philip McKenzie,et al.  The construction of computerized classification systems using machine learning algorithms: An overview , 1992 .

[217]  David A. Belsley,et al.  Regression Analysis and its Application: A Data-Oriented Approach.@@@Applied Linear Regression.@@@Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1981 .

[218]  E. M. Rounds A combined nonparametric approach to feature selection and binary decision tree design , 1980, Pattern Recognit..

[219]  Roland T. Chin,et al.  An Automated Approach to the Design of Decision Tree Classifiers , 1982 .

[220]  Nader H. Bshouty,et al.  Exact learning via the Monotone theory , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[221]  Henrik I. Christensen,et al.  Pattern Recognition in Practice IV: Multiple Paradigms, Comparative Studies and Hybrid Systems , 1994 .

[222]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[223]  Igor Kononenko,et al.  Inductive and Bayesian learning in medical diagnosis , 1993, Appl. Artif. Intell..

[224]  Andrew J. Lundberg,et al.  Discovering Morphemic Suffixes A Case Study In MDL Induction , 1995 .

[225]  G. Pagallo ADAPTATIVE DECISION TREE ALGORITHMS FOR LEARNING FROM EXAMPLES (Ph.D. Thesis) , 1990 .

[226]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[227]  Pramod K. Varshney,et al.  Application of information theory to the construction of efficient decision trees , 1982, IEEE Trans. Inf. Theory.

[228]  Jack Sklansky,et al.  Automated design of linear tree classifiers , 1990, Pattern Recognit..

[229]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[230]  N. E. Marquina Regressions by leaps and bounds and biased estimation techniques in yield modeling , 1979 .

[231]  Pamela C. Cosman,et al.  Unbalanced non-binary tree-structured vector quantizers , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[232]  King-Sun Fu,et al.  A method for the design of binary tree classifiers , 1983, Pattern Recognit..

[233]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[234]  Jan M. Van Campenhout,et al.  On the Possible Orderings in the Measurement Selection Problem , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[235]  P. Murphy An Empirical Analysis of the Bene t of Decision Tree Size Biases as a Function of Concept Distribution , 1994 .

[236]  Masahiro Fujita,et al.  Variable ordering algorithms for ordered binary decision diagrams and their evaluation , 1993, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[237]  Adam A. Porter,et al.  Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis , 1988, IEEE Trans. Software Eng..

[238]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[239]  Philip A. Chou,et al.  Optimal pruning with applications to tree-structured source coding and modeling , 1989, IEEE Trans. Inf. Theory.

[240]  Kristin P. Bennett,et al.  Serial and Parallel Multicategory Discrimination , 1994, SIAM J. Optim..

[241]  Ryszard S. Michalski,et al.  Should Decision Trees be Learned from Examples of from Decision Rules? , 1993, ISMIS.

[242]  James S. Thorp,et al.  Decision trees for real-time transient stability prediction , 1994 .

[243]  B. S. Everitt,et al.  Cluster analysis , 2014, Encyclopedia of Social Network Analysis and Mining.

[244]  Quentin F. Stout,et al.  Tree rebalancing in optimal time and space , 1986, CACM.

[245]  Ishwar K. Sethi,et al.  Design of multicategory multifeature split decision trees using perceptron learning , 1994, Pattern Recognit..

[246]  Pat Langley,et al.  Scaling to domains with irrelevant features , 1997, Annual Conference Computational Learning Theory.

[247]  David A. Landgrebe,et al.  Decision boundary feature extraction for nonparametric classification , 1993, IEEE Trans. Syst. Man Cybern..

[248]  E. Roth,et al.  Predicting stroke inpatient rehabilitation outcome using a classification tree approach. , 1994, Archives of physical medicine and rehabilitation.

[249]  Bernard M. E. Moret,et al.  Decision Trees and Diagrams , 1982, CSUR.

[250]  Carey E. Priebe,et al.  COMPARATIVE EVALUATION OF PATTERN RECOGNITION TECHNIQUES FOR DETECTION OF MICROCALCIFICATIONS IN MAMMOGRAPHY , 1993 .

[251]  Jerzy W. Grzymala-Busse,et al.  Global discretization of continuous attributes as preprocessing for machine learning , 1996, Int. J. Approx. Reason..

[252]  G. E. Naumov NP-completeness of problems of construction of optimal decision trees , 1991 .

[253]  Peter A. Flach Predicate Invention in Inductive Data Engineering , 1993, ECML.

[254]  Roger Fletcher,et al.  A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[255]  Steven Salzberg,et al.  Locating Protein Coding Regions in Human DNA Using a Decision Tree Algorithm , 1995, J. Comput. Biol..

[256]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[257]  Peter Eades,et al.  On Optimal Trees , 1981, J. Algorithms.

[258]  C. S. Wallace,et al.  Constructing a Minimal Diagnostic Decision Tree , 1993, Methods of Information in Medicine.

[259]  Alon Orlitsky,et al.  A Spectral Lower Bound Techniqye for the Size of Decision Trees and Two Level AND/OR Circuits , 1990, IEEE Trans. Computers.

[260]  King-Sun Fu,et al.  Automatic classification of cervical cells using a binary tree classifier , 1983, Pattern Recognition.

[261]  Roberto Todeschini,et al.  Linear discriminant classification tree: A user-driven multicriteria classification method , 1992 .

[262]  Robert E. Schapire,et al.  Predicting Nearly as Well as the Best Pruning of a Decision Tree , 1995, COLT.

[263]  Youngtae Park A comparison of neural net classifiers and linear tree classifiers: Their similarities and differences , 1994, Pattern Recognit..

[264]  Masao Sakauchi,et al.  A Balanced Hierarchical Data Structure for Multidimensional Data with Highly Efficient Dynamic Characteristics , 1993, IEEE Trans. Knowl. Data Eng..

[265]  Steven Salzberg,et al.  Distance Metrics for Instance-Bsed Learning , 1991, ISMIS.

[266]  W. J. Gibb,et al.  Selection of myocardial electrogram features for use by implantable devices , 1993, IEEE Transactions on Biomedical Engineering.

[267]  Paul Lukowicz,et al.  Experimental evaluation in computer science: A quantitative study , 1995, J. Syst. Softw..

[268]  J. R. Quinlan Discovering rules by induction from large collections of examples Intro-ductory readings in expert s , 1979 .

[269]  Joseph J. Mezrich When Is a Tree a Hedge , 1994 .

[270]  J L Talmon,et al.  The effect of noise and biases on the performance of machine learning algorithms. , 1992, International journal of bio-medical computing.

[271]  R B D'Agostino,et al.  A comparison of logistic regression to decision-tree induction in a medical domain. , 1993, Computers and biomedical research, an international journal.

[272]  Lilly Spirkovska Three-dimensional object recognition using similar triangles and decision trees , 1993, Pattern Recognit..

[273]  J. Robin B. Cockett,et al.  Decision tree reduction , 1990, JACM.

[274]  Nikos D. Hatziargyriou,et al.  A decision tree method for on-line steady state security assessment , 1994 .

[275]  R. Olshen,et al.  Predicting chemically induced duodenal ulcer and adrenal necrosis with classification trees. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[276]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[277]  G. R. Dattatreya,et al.  Bayesian and Decision Tree Approaches for Pattern Recognition Including Feature Measurement Costs , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[278]  Jan-Erik Strömberg,et al.  Extraction of diagnostic rules using recursive partitioning systems: A comparison of two approaches , 1992, Artif. Intell. Medicine.

[279]  Mihalis Yannakakis,et al.  The Analysis of Local Search Problems and Their Heuristics , 1990, Symposium on Theoretical Aspects of Computer Science.

[280]  G Reibnegger,et al.  The role of neopterin in assessing disease activity in Crohn's disease: classification and regression trees. , 1993, The American journal of gastroenterology.

[281]  Philip H. Swain,et al.  The decision tree classifier: Design and potential , 1977, IEEE Transactions on Geoscience Electronics.

[282]  George Nagy,et al.  Decision tree design using a probabilistic model , 1984, IEEE Trans. Inf. Theory.

[283]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[284]  Zhen-Ping Lo,et al.  Development of a two-stage neural network classifier , 1994 .

[285]  Mill Johannes G.A. Van,et al.  Transmission Of Information , 1961 .

[286]  Douglas H. Fisher,et al.  Overcoming process delays with decision tree induction , 1994, IEEE Expert.

[287]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[288]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[289]  Richard J. Mammone,et al.  Growing and Pruning Neural Tree Networks , 1993, IEEE Trans. Computers.

[290]  G. H. Landeweerd,et al.  Binary tree versus single level tree classification of white blood cells , 1983, Pattern Recognit..

[291]  B. Chandrasekaran,et al.  On dimensionality and sample size in statistical pattern classification , 1971, Pattern Recognit..

[292]  Jie Cheng,et al.  Applying machine learning to semiconductor manufacturing , 1993, IEEE Expert.

[293]  Charalambos Tsatsarakis,et al.  Supporting preprocessing and postprocessing for machine learning algorithms: a workbench for ID3 , 1993 .

[294]  Xiaobo Li,et al.  Tree classifier design with a permutation statistic , 1986, Pattern Recognit..

[295]  M. Ray Mercer,et al.  Least Upper Bounds an OBDD Sizes , 1994, IEEE Trans. Computers.

[296]  R. Olshen,et al.  Asymptotically Efficient Solutions to the Classification Problem , 1978 .

[297]  Jun Bao,et al.  On the Design of a Tree Classifier and its Applicaton to speech Recognition , 1991, Int. J. Pattern Recognit. Artif. Intell..

[298]  Siddhartha Bhattacharyya,et al.  A review of machine learning in scheduling , 1994 .

[299]  Jon Atli Benediktsson,et al.  Consensus theoretic classification methods , 1992, IEEE Trans. Syst. Man Cybern..

[300]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[301]  David George Heath,et al.  A geometric framework for machine learning , 1993 .

[302]  Cullen Schaffer,et al.  Conservation of Generalization: A Case Study , 1995 .

[303]  Sreejit Chakravarty,et al.  A Characterization of Binary Decision Diagrams , 1993, IEEE Trans. Computers.

[304]  Usama M. Fayyad,et al.  What Should Be Minimized in a Decision Tree? , 1990, AAAI.

[305]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[306]  Krzysztof J. Cios,et al.  A machine learning method for generation of a neural network architecture: a continuous ID3 algorithm , 1992, IEEE Trans. Neural Networks.

[307]  Ching Y. Suen,et al.  Analysis and Design of a Decision Tree Based on Entropy Reduction and Its Application to Large Character Set Recognition , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[308]  Donald Michie,et al.  Current developments in expert systems , 1987 .

[309]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[310]  James N. Morgan,et al.  Searching for structure (alias-AID-III) : an approach to analysis of substantial bodies of micro-data and documentation for a computer program (successor to the Automatic Interaction Detector Program) , 1971 .

[311]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[312]  Steven Salzberg,et al.  Decision Tree Induction: How Effective is the Greedy Heuristic? , 1995, KDD.

[313]  Walter Van de Velde Incremental Induction of Topologically Minimal Trees , 1990, ML.

[314]  Tom M. Mitchell,et al.  Experience with a learning personal assistant , 1994, CACM.

[315]  M. Golea,et al.  A Growth Algorithm for Neural Network Decision Trees , 1990 .

[316]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[317]  Laveen N. Kanal,et al.  Problem-Solving Models and Search Strategies for Pattern Recognition , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[318]  Satosi Watanabe,et al.  Pattern recognition as a quest for minimum entropy , 1981, Pattern Recognit..

[319]  J A Bentrup,et al.  An examination of inductive learning algorithms for the classification of sleep signals. , 1993, Biomedical sciences instrumentation.

[320]  R A Olshen,et al.  Predicting 1-year outcome following acute myocardial infarction: physicians versus computers. , 1990, Computers and biomedical research, an international journal.

[321]  Michael T. Goodrich,et al.  Decision Tree Construction in Fixed Dimensions: Being Global is Hard but Local Greed is Good , 1995 .

[322]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[323]  Yoshifumi Yasuoka,et al.  Utilization of a best linear discriminant function for designing the binary decision tree , 1991 .

[324]  Robert J. Marks,et al.  A performance comparison of trained multilayer perceptrons and trained classification trees , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[325]  D. Lubinsky Bivariate splits and consistent split criteria in dichotomous classification trees , 1994 .

[326]  H Theron,et al.  CID3: an extension of ID3 for attributes with ordered domains , 1991 .

[327]  Randal E. Bryant,et al.  Symbolic Boolean manipulation with ordered binary-decision diagrams , 1992, CSUR.

[328]  Kevin J. Dooley,et al.  Distinguishing between mean, variance and autocorrelation changes in statistical quality control , 1995 .

[329]  Eyal Kushilevitz,et al.  Learning decision trees using the Fourier spectrum , 1991, STOC '91.

[330]  John Mingers,et al.  Expert Systems—Rule Induction with Statistical Data , 1987 .

[331]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .