Wrapper discretization by means of estimation of distribution algorithms

We present a supervised wrapper approach to discretization. In contrast to many classical approaches, the discretization process is multivariate: all variables are discretized simultaneously, and the proposed discretization is evaluated with the Naive-Bayes classifier. The search for the optimal discretization is carried out as an optimization process with the learning model estimated accuracy guiding it. The global optimization algorithm is based on estimation of distribution algorithms, a set of novel algorithms which are special kinds of evolutionary algorithms. In order to evaluate the behaviour of the algorithm, an analysis of different parameters is performed by means of analysis of variance (ANOVA). The evaluation was carried out using artificial datasets, and with UCI datasets. The results suggest that the proposed method provides an effective and robust technique for discretizating variables.

[1]  Qingxiang Wu,et al.  A Novel Discretizer for Knowledge Discovery Based on Multiknowledge Approaches , 2006 .

[2]  Pierre Geurts,et al.  Investigation and Reduction of Discretization Variance in Decision Tree Induction , 2000, ECML.

[3]  David E. Goldberg,et al.  A Survey of Optimization by Building and Using Probabilistic Models , 2002, Comput. Optim. Appl..

[4]  Heinz Mühlenbein,et al.  The Equation for Response to Selection and Its Use for Prediction , 1997, Evolutionary Computation.

[5]  Luís Torgo,et al.  Dynamic Discretization of Continuous Attributes , 1998, IBERAMIA.

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  C. Robert Kenley,et al.  Gaussian influence diagrams , 1989 .

[8]  Jaume Bacardit,et al.  Evolving Multiple Discretizations with Adaptive Intervals for a Pittsburgh Rule-Based Learning Classifier System , 2003, GECCO.

[9]  Michèle Sebag,et al.  Extending Population-Based Incremental Learning to Continuous Search Spaces , 1998, PPSN.

[10]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[11]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[12]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[13]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[14]  Luís Torgo,et al.  Search-Based Class Discretization , 1997, ECML.

[15]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[16]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[17]  Marc Boullé Khiops: A Discretization Method of Continuous Attributes with Guaranteed Resistance to Noise , 2003, MLDM.

[18]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[19]  Geoffrey I. Webb,et al.  Discretization for naive-Bayes learning: managing discretization bias and variance , 2008, Machine Learning.

[20]  Bernhard Pfahringer,et al.  Compression-Based Discretization of Continuous Attributes , 1995, ICML.

[21]  Daphne Koller,et al.  Nonuniform Dynamic Discretization in Hybrid Networks , 1997, UAI.

[22]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[23]  Ron Kohavi,et al.  Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.

[24]  Marc Boullé,et al.  Optimal bin number for equal frequency discretizations in supervized learning , 2005, Intell. Data Anal..

[25]  Ian Witten,et al.  Data Mining , 2000 .

[26]  Marc Boullé A Grouping Method for Categorical Attributes Having Very Large Number of Values , 2005, MLDM.

[27]  Geoffrey I. Webb,et al.  On Why Discretization Works for Naive-Bayes Classifiers , 2003, Australian Conference on Artificial Intelligence.

[28]  José Manuel Gutiérrez,et al.  Expert Systems and Probabiistic Network Models , 1996 .

[29]  Stephen D. Bay Multivariate discretization of continuous variables for set mining , 2000, KDD '00.

[30]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[31]  Pedro Larrañaga,et al.  Feature subset selection by genetic algorithms and estimation of distribution algorithms - A case study in the survival of cirrhotic patients treated with TIPS , 2001, Artif. Intell. Medicine.

[32]  David E. Goldberg,et al.  Genetic Algorithms, Clustering, and the Breaking of Symmetry , 2000, PPSN.

[33]  Heinz Mühlenbein,et al.  FDA -A Scalable Evolutionary Algorithm for the Optimization of Additively Decomposed Functions , 1999, Evolutionary Computation.

[34]  Pedro Larrañaga,et al.  Combinatonal Optimization by Learning and Simulation of Bayesian Networks , 2000, UAI.

[35]  Julio J. Valdés,et al.  An evolution strategies approach to the simultaneous discretization of numeric attributes in data mining , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[36]  Seymour Geisser,et al.  Optimal dichotomization of screening test variables , 2005 .

[37]  Jesús S. Aguilar-Ruiz,et al.  Natural Coding: A More Efficient Representation for Evolutionary Learning , 2003, GECCO.

[38]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[39]  Marek Kretowski,et al.  An Evolutionary Algorithm Using Multivariate Discretization for Decision Rule Induction , 1999, PKDD.

[40]  Ying Yang,et al.  Discretization for Naive-Bayes Learning , 2003 .

[41]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[42]  Kate Revoredo,et al.  Search-Based Class Discretization for Hidden Markov Model for Regression , 2004, SBIA.

[43]  Hung Son Nguyen,et al.  Discretization Problem for Rough Sets Methods , 1998, Rough Sets and Current Trends in Computing.

[44]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[45]  Marc Boullé,et al.  Khiops: A Statistical Discretization Method of Continuous Attributes , 2004, Machine Learning.

[46]  Pedro Larrañaga,et al.  Mathematical modelling of UMDAc algorithm with tournament selection. Behaviour on linear and quadratic functions , 2002, Int. J. Approx. Reason..

[47]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[48]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[49]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[50]  A. A. Zhigli︠a︡vskiĭ,et al.  Theory of Global Random Search , 1991 .

[51]  H. Scheffé,et al.  The Analysis of Variance , 1960 .

[52]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[53]  Andrew K. C. Wong,et al.  Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Marc Boullé,et al.  Supervised Evaluation of Dataset Partitions: Advantages and Practice , 2005, MLDM.

[55]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[56]  Lucila Ohno-Machado,et al.  A greedy algorithm for supervised discretization , 2004, J. Biomed. Informatics.

[57]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[58]  Chun-Nan Hsu,et al.  Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers , 2004, Machine Learning.

[59]  Pedro Larrañaga,et al.  Learning Bayesian networks in the space of structures by estimation of distribution algorithms , 2003, Int. J. Intell. Syst..

[60]  Qingfu Zhang,et al.  On stability of fixed points of limit models of univariate marginal distribution algorithm and factorized distribution algorithm , 2004, IEEE Transactions on Evolutionary Computation.

[61]  Krzysztof Grabczewski SSV Criterion Based Discretization for Naive Bayes Classifiers , 2004, ICAISC.

[62]  Marc Boullé,et al.  Multivariate Discretization by Recursive Supervised Bipartition of Graph , 2005, MLDM.

[63]  Qingfu Zhang,et al.  On the convergence of a class of estimation of distribution algorithms , 2004, IEEE Transactions on Evolutionary Computation.

[64]  Yang Wang,et al.  A global optimal algorithm for class-dependent discretization of continuous data , 2004, Intell. Data Anal..

[65]  Pedro Larrañaga,et al.  Optimization in Continuous Domains by Learning and Simulation of Gaussian Networks , 2000 .