Mining Contrast Inequalities in Numeric Dataset

Finding relational expressions which exist frequently in one class of data while not in the other class of data is an interesting work. In this paper, a relational expression of this kind is defined as a contrast inequality. Gene Expression Programming (GEP) is powerful to discover relations from data and express them in mathematical level. Hence, it is desirable to apply GEP to such mining task. The main contributions of this paper include: (1) introducing the concept of contrast inequality mining, (2) designing a two-genome chromosome structure to guarantee that each individual in GEP is a valid inequality, (3) proposing a new genetic mutation to improve the efficiency of evolving contrast inequalities, (4) presenting a GEP-based method to discover contrast inequalities, (5) giving an extensive performance study on real-world datasets. The experimental results show that the proposed methods are effective. Contrast inequalities with high discriminative power are discovered from the real-world datasets. Some potential works on contrast inequality mining are discussed.

[1]  James Bailey,et al.  Fast Algorithms for Mining Emerging Patterns , 2002, PKDD.

[2]  Kotagiri Ramamohanarao,et al.  Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets , 2000, KDD '00.

[3]  Liang Tang,et al.  Mining Class Contrast Functions by Gene Expression Programming , 2009, ADMA.

[4]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[5]  James Bailey,et al.  A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns , 2003, Third IEEE International Conference on Data Mining.

[6]  Meng Li,et al.  Stream Operators for Querying Data Streams , 2005, WAIM.

[7]  Xiangji Huang,et al.  Diverging patterns: discovering significant frequency change dissimilarities in large databases , 2009, CIKM.

[8]  Cândida Ferreira,et al.  Gene Expression Programming: A New Adaptive Algorithm for Solving Problems , 2001, Complex Syst..

[9]  Jinyan Li,et al.  Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. , 2002 .

[10]  Cândida Ferreira,et al.  Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence (Studies in Computational Intelligence) , 2006 .

[11]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[12]  Mohammed J. Zaki,et al.  Large-Scale Parallel Data Mining , 2002, Lecture Notes in Computer Science.

[13]  Kotagiri Ramamohanarao,et al.  An Efficient Single-Scan Algorithm for Mining Essential Jumping Emerging Patterns for Classification , 2002, PAKDD.

[14]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[15]  Heitor Silvério Lopes,et al.  EGIPSYS: AN ENHANCED GENE EXPRESSION PROGRAMMING APPROACH FOR SYMBOLIC REGRESSION PROBLEMS † , 2004 .

[16]  Jinyan Li,et al.  Mining statistically important equivalence classes and delta-discriminative emerging patterns , 2007, KDD '07.

[17]  Candida Ferreira Gene expression programming , 2006 .

[18]  Cândida Ferreira,et al.  Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence , 2014, Studies in Computational Intelligence.

[19]  Wei Jiang,et al.  Data Mining Methods and Applications , 2006 .

[20]  James Bailey,et al.  Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams , 2006, KDD '06.

[21]  Kotagiri Ramamohanarao,et al.  DeEPs: A New Instance-Based Lazy Discovery and Classification System , 2004, Machine Learning.

[22]  Huiqing Liu,et al.  Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients , 2003, Bioinform..

[23]  Weimin Xiao,et al.  Evolving accurate and compact classification rules with gene expression programming , 2003, IEEE Trans. Evol. Comput..

[24]  Changjie Tang,et al.  Time Series Prediction Based on Gene Expression Programming , 2004, WAIM.

[25]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.