TREEGL: reverse engineering tree-evolving gene networks underlying developing biological lineages

Motivation: Estimating gene regulatory networks over biological lineages is central to a deeper understanding of how cells evolve during development and differentiation. However, one challenge in estimating such evolving networks is that their host cells not only contiguously evolve, but also branch over time. For example, a stem cell evolves into two more specialized daughter cells at each division, forming a tree of networks. Another example is in a laboratory setting: a biologist may apply several different drugs individually to malignant cancer cells to analyze the effects of each drug on the cells; the cells treated by one drug may not be intrinsically similar to those treated by another, but rather to the malignant cancer cells they were derived from. Results: We propose a novel algorithm, Treegl, an ℓ1 plus total variation penalized linear regression method, to effectively estimate multiple gene networks corresponding to cell types related by a tree-genealogy, based on only a few samples from each cell type. Treegl takes advantage of the similarity between related networks along the biological lineage, while at the same time exposing sharp differences between the networks. We demonstrate that our algorithm performs significantly better than existing methods via simulation. Furthermore we explore an application to a breast cancer dataset, and show that our algorithm is able to produce biologically valid results that provide insight into the progression and reversion of breast cancer cells. Availability: Software will be available at http://www.sailing.cs.cmu.edu/. Contact: epxing@cs.cmu.edu

[1]  N. Seidah,et al.  Identification and localization of 7B2 protein in human, porcine, and rat thyroid gland and in human medullary carcinoma. , 1988, Endocrinology.

[2]  A. Roebroek,et al.  Differential expression of the gene encoding the novel pituitary polypeptide 7B2 in human lung cancer cells. , 1989, Cancer research.

[3]  W. Cleveland,et al.  Computational methods for local regression , 1991 .

[4]  M. Bissell,et al.  Interaction with basement membrane serves to rapidly distinguish growth and differentiation pattern of normal and malignant human breast epithelial cells. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[5]  R. Tibshirani,et al.  Varying‐Coefficient Models , 1993 .

[6]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[7]  Kim-Chuan Toh,et al.  SDPT3 -- A Matlab Software Package for Semidefinite Programming , 1996 .

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  C. Larabell,et al.  Reversion of the Malignant Phenotype of Human Breast Cells in Three-Dimensional Culture and In Vivo by Integrin Blocking Antibodies , 1997, The Journal of cell biology.

[10]  Michael I. Jordan Graphical Models , 1998 .

[11]  R. Xavier,et al.  Tumor Induction of VEGF Promoter Activity in Stromal Cells , 1998, Cell.

[12]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[13]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[14]  E. Davidson Genomic Regulatory Systems , 2001 .

[15]  Jesper Tegnér,et al.  Reverse engineering gene networks using singular value decomposition and robust regression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  E. Sahai,et al.  RHO–GTPases and cancer , 2002, Nature Reviews Cancer.

[17]  R. Hansen,et al.  Phenotypic reversion or death of cancer cells by altering signaling pathways in three-dimensional contexts. , 2002, Journal of the National Cancer Institute.

[18]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[19]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[20]  D. Radisky,et al.  Polarity and proliferation are controlled by distinct signaling pathways downstream of PI3-kinase in breast epithelial tumor cells , 2004, The Journal of cell biology.

[21]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[22]  F. Peale,et al.  Expression of vascular endothelial growth factor, hypoxia inducible factor 1α, and carbonic anhydrase IX in human tumours , 2004, Journal of Clinical Pathology.

[23]  Zheng Li,et al.  Inferring pathways and networks with a Bayesian framework , 2004, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[24]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[25]  Malgorzata Bogdan,et al.  On the significance of sequence alignments when using multiple scoring matrices , 2004, Bioinform..

[26]  M. Gerstein,et al.  Genomic analysis of regulatory network dynamics reveals large topological changes , 2004, Nature.

[27]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[28]  Howard Y. Chang,et al.  Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[29]  D. Yamazaki,et al.  Regulation of cancer cell motility through actin reorganization , 2005, Cancer science.

[30]  Jianqing Fan,et al.  Nonlinear Time Series: Nonparametric and Parametric Methods , 2005 .

[31]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[32]  R. Melamed,et al.  Gene expression microarrays: glimpses of the immunological genome , 2006, Nature Immunology.

[33]  B. Paulweber,et al.  The cyclooxygenase-2 (PTGS2) 8473T>C polymorphism is associated with breast cancer risk. , 2006, Clinical cancer research : an official journal of the American Association for Cancer Research.

[34]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[35]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[36]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[37]  Mihalis Yannakakis,et al.  Small Approximate Pareto Sets for Bi-objective Shortest Paths and Other Problems , 2007, APPROX-RANDOM.

[38]  M. Bissell,et al.  Rap1 integrates tissue polarity, lumen formation, and tumorigenic potential in human breast epithelial cells. , 2007, Cancer research.

[39]  Klaus Jansen,et al.  Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques , 2006, Lecture Notes in Computer Science.

[40]  B. Schölkopf,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2007 .

[41]  Elchanan Mossel,et al.  Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms , 2007, SIAM J. Comput..

[42]  Le Song,et al.  Sparsistent Learning of Varying-coefficient Models with Structural Changes , 2009, NIPS.

[43]  Le Song,et al.  KELLER: estimating time-varying interactions between genes , 2009, Bioinform..

[44]  Naoki Abe,et al.  Grouped graphical Granger modeling for gene expression regulatory networks discovery , 2009, Bioinform..

[45]  Le Song,et al.  Time-Varying Dynamic Bayesian Networks , 2009, NIPS.

[46]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.

[47]  E. Jung,et al.  Decreased annexin A3 expression correlates with tumor progression in papillary thyroid cancer , 2010, Proteomics. Clinical applications.

[48]  E. Xing,et al.  An E-cient Proximal Gradient Method for General Structured Sparse Learning , 2010 .

[49]  Xi Chen,et al.  Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso , 2010, ArXiv.

[50]  Le Song,et al.  TVNViewer: An interactive visualization tool for exploring networks that change over time or space , 2011, Bioinform..