A review of feature selection techniques via gene expression profiles

The invention of DNA microarray technology has modernized the approach of biology research in such a way that scientists can now measure the expression levels of thousands of genes simultaneously in a single experiment. Although this technology has shifted a new era in molecular classification, interpreting microarray data still remain a challenging issue due to their innate nature of “high dimensional low sample size”. Therefore, robust and accurate feature selection methods are required to identify differentially expressed genes across varied samples for example between cancerous and normal cells. Successful of feature selection techniques will assist to correctly classify different cancer types and consequently led to a better understanding of genetic signatures in cancers and would improve treatment strategies. This paper presents a review of feature selection techniques that have been employed in microarray data analysis. Moreover, other problems associated with microarray data analysis also addressed. In addition, several trends were noted including highly reliance on filter techniques compared to wrapper and embedded, a growing direction towards ensemble feature selection techniques and future extension to apply feature selection in combination of heterogeneous data sources.

[1]  Simon Rogers,et al.  Class Prediction with Microarray Datasets , 2004 .

[2]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[3]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[4]  Zhang Hui,et al.  Wrapper Feature Extraction for Time Series Classification Using Singular Value Decomposition , 2005 .

[5]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[6]  A. Levine,et al.  Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. , 2001, Combinatorial chemistry & high throughput screening.

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  Hua Wang,et al.  Combined Gene Selection Methods for Microarray Data Analysis , 2006, KES.

[10]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[11]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[12]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[13]  Benny Y. M. Fung,et al.  Classification of heterogeneous gene expression data , 2003, SKDD.

[14]  Xuefeng Bruce Ling,et al.  Multiclass cancer classification and biomarker discovery using GA-based algorithms , 2005, Bioinform..

[15]  Xiaohui S. Xie,et al.  Disease gene discovery through integrative genomics. , 2005, Annual review of genomics and human genetics.

[16]  Nir Friedman,et al.  Comparative analysis of algorithms for signal quantitation from oligonucleotide microarrays , 2004, Bioinform..

[17]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.