Implication Networks from Large Gene-expression Datasets

We present a new algorithm for building Boolean networks f rom very large amounts of gene expression data. The resulting networks include not only symmetric relationships between genes, such as co-expression, but also asymmetr ic relations that represent if-then rules. The approach is conceptually simple and fast eno ugh that it can build a complete gene network using 3 billion gene pairs with more than 9,500 e xpression values per genepair in less than 3 hours on an ordinary office computer. The algorithm was applied to publicly available data from thousands of microarrays fo r humans, mice, and fruit flies (for a total of 365 million Affymetrix probeset expression levels). The resulting network consists of hundreds of millions of relationships between genes, and contains biologically meaningful information about gender differences, tissue d iff rences, development, differentiation and co-expression. We also examine re lationships that are conserved between humans, mice, and fruit flies. The full Bool ean relationships are available for exploration at http://gourd.stanford.edu/~sahoo/recomb07/ .

[1]  T. Barrette,et al.  Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. , 2007, Neoplasia.

[2]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[3]  American Societies for Experimental Biology , 1949, Nature.

[4]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[5]  B. Franza,et al.  Identity of the proliferating cell nuclear antigen and cyclin , 1984, Nature.

[6]  Adam P. Arkin,et al.  Statistical Construction of Chemical Reaction Mechanisms from Measured Time-Series , 1995 .

[7]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[8]  Jiming Liu,et al.  A Method of Learning Implication Networks from Empirical Data: Algorithm and Monte-Carlo Simulation-Based Validation , 1997, IEEE Trans. Knowl. Data Eng..

[9]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[10]  R. Goorha,et al.  The mouse mitotic checkpoint gene bub1b, a novel bub1 family member, is expressed in a cell cycle-dependent manner. , 1999, Genomics.

[11]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Satoru Miyano,et al.  Utilizing Evolutionary Information and Gene Expression Data for Estimating Gene Networks with Bayesian Network Models , 2005, J. Bioinform. Comput. Biol..