A novel fuzzy and multiobjective evolutionary algorithm based gene assignment for clustering short time series expression data

Conventional clustering algorithms based on Euclidean distance or Pearson correlation coefficient are not able to include order information in the distance metric and also unable to distinguish between random and real biological patterns. We present template based clustering algorithm for time series gene expression data. Template profiles are defined based on up-down regulation of genes between consecutive time points. Assignment of genes to templates is based on fuzzy membership function. Multi-objective evolutionary algorithm is used to determine compact clusters with varying number of templates. Statistical significance of each template is determined using permutation based non-parametric test. Statistically significant profiles are further tested for their biological relevance using gene ontology analysis. The algorithm was able to distinguish between real and noisy pattern when tested on artificial and real biological data. The proposed algorithm has shown better or similar performance compared to STEM and better than k-means on a real biological data.

[1]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[2]  X. Cui,et al.  Statistical tests for differential expression in cDNA microarray experiments , 2003, Genome Biology.

[3]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  L. P. Zhao,et al.  Statistical modeling of large microarray data sets to identify stimulus-response profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[6]  Tommi S. Jaakkola,et al.  A new approach to analyzing gene expression time series data , 2002, RECOMB '02.

[7]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[8]  Alexander Schliep,et al.  Using hidden Markov models to analyze gene expression time course data , 2003, ISMB.

[9]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[11]  Kwang-Hyun Cho,et al.  Microarray data clustering based on temporal variation: FCV with TSD preclustering. , 2003, Applied bioinformatics.

[12]  Xinglai Ji,et al.  Mining gene expression data using a novel approach based on hidden Markov models , 2003, FEBS letters.

[13]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[14]  S. Falkow,et al.  Cag pathogenicity island-specific responses of gastric epithelial cells to Helicobacter pylori infection , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  B. S. Baker,et al.  Gene Expression During the Life Cycle of Drosophila melanogaster , 2002, Science.

[16]  Ziv Bar-Joseph,et al.  STEM: a tool for the analysis of short time series gene expression data , 2006, BMC Bioinformatics.

[17]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[18]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[20]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[21]  Satoru Miyano,et al.  Statistical analysis of a small set of time-ordered gene expression data using linear splines , 2002, Bioinform..

[22]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[23]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[24]  Hua Liu,et al.  Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments , 2005, BMC Bioinformatics.

[25]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[27]  Paola Sebastiani,et al.  Clustering Short Gene Expression Profiles , 2006, RECOMB.

[28]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[29]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[30]  C. A. Murthy,et al.  In search of optimal clusters using genetic algorithms , 1996, Pattern Recognit. Lett..

[31]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[32]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[33]  Kevin Kwong,et al.  Temporal profiling of gene expression during neurogenesis and remodeling in the olfactory epithelium at short intervals after target ablation , 2005, Journal of neuroscience research.

[34]  Vito Di Gesù,et al.  GenClust: A genetic algorithm for clustering gene expression data , 2005, BMC Bioinformatics.