Joint stage recognition and anatomical annotation of drosophila gene expression patterns

Motivation: Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a Drosophila melanogaster embryo delivers the detailed spatio-temporal patterns of the gene expression. Many related biological problems such as the detection of co-expressed genes, co-regulated genes and transcription factor binding motifs rely heavily on the analysis of these image patterns. To provide the text-based pattern searching for facilitating related biological studies, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with developmental stage term and anatomical ontology terms manually by domain experts. Due to the rapid increase in the number of such images and the inevitable bias annotations by human curators, it is necessary to develop an automatic method to recognize the developmental stage and annotate anatomical terms. Results: In this article, we propose a novel computational model for jointly stage classification and anatomical terms annotation of Drosophila gene expression patterns. We propose a novel Tri-Relational Graph (TG) model that comprises the data graph, anatomical term graph, developmental stage term graph, and connect them by two additional graphs induced from stage or annotation label assignments. Upon the TG model, we introduce a Preferential Random Walk (PRW) method to jointly recognize developmental stage and annotate anatomical terms by utilizing the interrelations between two tasks. The experimental results on two refined BDGP datasets demonstrate that our joint learning method can achieve superior prediction results on both tasks than the state-of-the-art methods. Availability: http://ranger.uta.edu/%7eheng/Drosophila/ Contact: heng@uta.edu

[1]  Rong Jin,et al.  Correlated Label Propagation with Application to Multi-label Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Scott E. Fraser,et al.  Imaging in Systems Biology , 2007, Cell.

[3]  Jieping Ye,et al.  A bag-of-words approach for Drosophila gene expression pattern annotation , 2009, BMC Bioinformatics.

[4]  Jieping Ye,et al.  A shared-subspace learning framework for multi-label classification , 2010, TKDD.

[5]  Jieping Ye,et al.  Extracting shared subspace for multi-label classification , 2008, KDD.

[6]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Victor B. Strelets,et al.  FlyBase: anatomical data, images and queries , 2005, Nucleic Acids Res..

[8]  Julie M. Sullivan,et al.  FlyMine: an integrated database for Drosophila and Anopheles genomics , 2007, Genome Biology.

[9]  Charless C. Fowlkes,et al.  Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution I: data acquisition pipeline , 2006, Genome Biology.

[10]  Tao Mei,et al.  Graph-based semi-supervised learning with multi-label , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[11]  Eugene W. Myers,et al.  Comparing in situ mRNA expression patterns of drosophila embryos , 2004, RECOMB.

[12]  Charless C. Fowlkes,et al.  A Quantitative Spatiotemporal Atlas of Gene Expression in the Drosophila Blastoderm , 2008, Cell.

[13]  Bernhard Schölkopf,et al.  Learning from Labeled and Unlabeled Data Using Random Walks , 2004, DAGM-Symposium.

[14]  Christos Faloutsos,et al.  SPEX2: automated concise extraction of spatial gene expression patterns from Fly embryo ISH images , 2010, Bioinform..

[15]  P. Tomançak,et al.  Global Analysis of mRNA Localization Reveals a Prominent Role in Organizing Cellular Architecture and Function , 2007, Cell.

[16]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[17]  Chris H. Q. Ding,et al.  Image annotation using multi-label correlated Green's function , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Michael Wooldridge,et al.  Proceedings of the 21st International Joint Conference on Artificial Intelligence , 2009 .

[19]  S. Panchanathan,et al.  BEST: a novel computational approach for comparing gene expression patterns from early stages of Drosophila melanogaster development. , 2002, Genetics.

[20]  E. Myers,et al.  Automatic image analysis for gene expression patterns of fly embryos , 2007, BMC Cell Biology.

[21]  Jieping Ye,et al.  Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[25]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[26]  Jieping Ye,et al.  Drosophila gene expression pattern annotation using sparse features and term-term interactions , 2009, KDD.

[27]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[28]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[29]  H. Jäckle,et al.  FlyMove--a new way to look at development of Drosophila. , 2003, Trends in genetics : TIG.

[30]  G. Rubin,et al.  Global analysis of patterns of gene expression during Drosophila embryogenesis , 2007, Genome Biology.

[31]  Zoubin Ghahramani,et al.  Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions , 2003, ICML.