Discovery of Decision Rules from Databases: An Evolutionary Approach

Decision rules are a natural form of representing knowledge. Their extraction from databases requires the capability for effective search large solution spaces. This paper shows, how we can deal with this problem using evolutionary algorithms (EAs). We propose an EA-based system called EDRL, which for each class label sequentially generates a disjunctive set of decision rules in propositional form. EDRL uses an EA to search for one rule at a time; then, all the positive examples covered by the rule are removed from the learning set and the search is repeated on the remaining examples. Our version of EA differs from standard genetic algorithm. In addition to the well-known uniform crossver it employs two non-standard genetic operators, which we call changing condition and insertion. Currently EDRL requires prior discretization of all continuous-valued attributes. A discretization technique based on the minimization of class entropy is used. The performance of EDRL is evaluated by comparing its classification accuracy with that of C4.5 learning algorithm on six datasets from UCI repository.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Nicholas J. Radcliffe,et al.  A Genetic Algorithm-Based Approach to Data Mining , 1996, KDD.

[3]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[4]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[5]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[6]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[7]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[8]  Gilles Venturini,et al.  Learning First Order Logic Rules with a Genetic Algorithm , 1995, KDD.

[9]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[10]  Filippo Neri,et al.  Exploring the Power of Genetic Search in Learning Symbolic Classifiers , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Filippo Neri,et al.  Search-Intensive Concept Induction , 1995, Evolutionary Computation.

[12]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[13]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .