Gene Expression Classification: Decision Trees vs. SVMs

In this article, we compare decision trees (DT) and support vector machines (SVM) in classifying gene expressions. With the explosion of genome research, tremendous amount of data have been made available and a deep insight study becomes demanding. Among various kinds of gene analysis approaches being developed, sequence based gene expression classification shows the importance due to its ability to identify existence of some specific gene pieces. In this article, we focus on two major categories of classification methods, namely decision trees and support vector machines. By comparing various versions of decision tree algorithms, SVMs, and a particular SVM that integrates structural information of the gene sequence, it is shown that the structural information does help in achieving better performance with respect to the classification accuracy.

[1]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[6]  A. Milosavljevic,et al.  Identification and characterization of new human medium reiteration frequency repeats. , 1993, Nucleic acids research.

[7]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[8]  George Karypis,et al.  Evaluation of Techniques for Classifying Biological Sequences , 2002, PAKDD.

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Elmar Nöth,et al.  Interpolated markov chains for eukaryotic promoter recognition , 1999, Bioinform..

[11]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[12]  Mohammed J. Zaki,et al.  Mining features for sequence classification , 1999, KDD '99.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Xiaohui Yuan,et al.  Mining negative association rules , 2002, Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.

[15]  Dennis Shasha,et al.  New Techniques for DNA Sequence Classification , 1999, J. Comput. Biol..

[16]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[17]  George Karypis,et al.  Gene Classification Using Expression Profiles: A Feasibility Study , 2005, Int. J. Artif. Intell. Tools.

[18]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .