Phylogenomics with paralogs

Significance We demonstrate that the distribution of paralogs in large gene families contains in itself sufficient phylogenetic signal to infer fully resolved species phylogenies. This source of phylogenetic information is independent of information contained in orthologous sequences and is resilient against horizontal gene transfer. An important consequence is that phylogenomics data sets need not be restricted to 1:1 orthologs. Phylogenomics heavily relies on well-curated sequence data sets that comprise, for each gene, exclusively 1:1 orthologos. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics, we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. Although the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer.

[1]  Gaston H. Gonnet,et al.  The Impact of Gene Duplication, Insertion, Deletion, Lateral Gene Transfer and Sequencing Error on Orthology Inference: A Simulation Study , 2013, PloS one.

[2]  Andrzej Lingas,et al.  On the Complexity of Constructing Evolutionary Trees , 1999, J. Comb. Optim..

[3]  Derek G. Corneil,et al.  Complement reducible graphs , 1981, Discret. Appl. Math..

[4]  Federico Bassetti,et al.  Cross-species gene-family fluctuations reveal the dynamics of horizontal transfers , 2014, Nucleic acids research.

[5]  Matthew W. Hahn,et al.  Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution , 2007, Genome Biology.

[6]  Mike A. Steel,et al.  Algorithmic Aspects of Tree Amalgamation , 2000, J. Algorithms.

[7]  Andreas W. M. Dress,et al.  Recovering Symbolically Dated, Rooted Trees from Symbolic Ultrametrics , 1998 .

[8]  Nadia El-Mabrouk,et al.  Gene Family Evolution by Duplication, Speciation and Loss , 2022 .

[9]  Arcady R. Mushegian,et al.  Computational methods for Gene Orthology inference , 2011, Briefings Bioinform..

[10]  Mike Steel,et al.  Closure operations in phylogenetics. , 2007, Mathematical biosciences.

[11]  Bang Ye Wu,et al.  Constructing the Maximum Consensus Tree from Rooted Triples , 2004, J. Comb. Optim..

[12]  Daniel H. Huson,et al.  Phylogenetic Networks - Concepts, Algorithms and Applications , 2011 .

[13]  Yunlong Liu,et al.  Complexity and parameterized algorithms for Cograph Editing , 2012, Theor. Comput. Sci..

[14]  Jesper Jansson,et al.  On the Complexity of Inferring Rooted Evolutionary Trees , 2001, Electron. Notes Discret. Math..

[15]  Natália Martínková,et al.  SigHunt: horizontal gene transfer finder optimized for eukaryotic genomes , 2014, Bioinform..

[16]  Cédric Chauve,et al.  Space of Gene/Species Trees Reconciliations and Parsimonious Models , 2009, J. Comput. Biol..

[17]  Charles Semple,et al.  Phylogenetic Supertrees , 2004, Computational Biology.

[18]  Kwang-Hwi Cho,et al.  BIOPHYSICS AND COMPUTATIONAL BIOLOGY , 2009 .

[19]  Jaroslaw Byrka,et al.  New Results on Optimizing Rooted Triplets Consistency , 2008, ISAAC.

[20]  Andrzej Lingas,et al.  The Complexity of Inferring a Minimally Resolved Phylogenetic Supertree , 2010, WABI.

[21]  Sonja J. Prohaska,et al.  Proteinortho: Detection of (Co-)orthologs in large-scale analysis , 2011, BMC Bioinformatics.

[22]  Chang-Biau Yang,et al.  Approximation Algorithms for Constructing Evolutionary Trees , 2006 .

[23]  Peter F. Stadler,et al.  Simulation of gene family histories , 2014, BMC Bioinformatics.

[24]  Yunlong Liu,et al.  Cograph Editing: Complexity and Parameterized Algorithms , 2011, COCOON.

[25]  Lorna Stewart,et al.  A Linear Recognition Algorithm for Cographs , 1985, SIAM J. Comput..

[26]  Matthias Mnich,et al.  Kernel and fast algorithm for dense triplet inconsistency , 2010, Theor. Comput. Sci..

[27]  Oliver Eulenstein,et al.  The multiple gene duplication problem revisited , 2008, ISMB.

[28]  A. Brandstädt,et al.  Graph Classes: A Survey , 1987 .

[29]  Tandy J. Warnow,et al.  Constructing a Tree from Homeomorphic Subtrees, with Applications to Computational Evolutionary Biology , 1996, SODA '96.

[30]  J. Palmer,et al.  Horizontal gene transfer in eukaryotic evolution , 2008, Nature Reviews Genetics.

[31]  Krzysztof Giaro,et al.  TreeCmp: Comparison of Trees in Polynomial Time , 2012, Evolutionary Bioinformatics Online.

[32]  Katharina T. Huber,et al.  Orthology relations, symbolic ultrametrics, and cographs , 2013, Journal of mathematical biology.

[33]  Oliver Eulenstein,et al.  Locating Large-Scale Gene Duplication Events through Reconciled Trees: Implications for Identifying Ancient Polyploidy Events in Plants , 2009, J. Comput. Biol..

[34]  Manja Marz,et al.  Genomewide comparison and novel ncRNAs of Aquificales , 2014, BMC Genomics.

[35]  Jerzy Tiuryn,et al.  DLS-trees: A model of evolutionary scenarios , 2006, Theor. Comput. Sci..

[36]  Michel Habib,et al.  A simple linear time algorithm for cograph recognition , 2005, Discret. Appl. Math..

[37]  M. Steel,et al.  Extension Operations on Sets of Leaf-Labeled Trees , 1995 .

[38]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[39]  Charles Semple,et al.  Reconstructing Minimal Rooted Trees , 2003, Discret. Appl. Math..

[40]  Alfred V. Aho,et al.  Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions , 1981, SIAM J. Comput..

[41]  Katharina T. Huber,et al.  Basic Phylogenetic Combinatorics , 2011 .

[42]  Kunihiko Sadakane,et al.  Rooted Maximum Agreement Supertrees , 2004, LATIN.

[43]  Steven Kelk,et al.  Worst-case optimal approximation algorithms for maximizing triplet consistency within phylogenetic networks , 2007, J. Discrete Algorithms.

[44]  Yan Zhang,et al.  PATRIC, the bacterial bioinformatics database and analysis resource , 2013, Nucleic Acids Res..

[45]  Bengt Sennblad,et al.  Bayesian gene/species tree reconciliation and orthology analysis using MCMC , 2003, ISMB.

[46]  Paola Bonizzoni,et al.  Reconciling a gene tree to a species tree under the duplication cost model , 2005, Theor. Comput. Sci..

[47]  Seyed Naser Hashemi,et al.  New Heuristics for Rooted Triplet Consistency , 2013, Algorithms.

[48]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[49]  R. Page,et al.  From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. , 1997, Molecular phylogenetics and evolution.

[50]  Aristotelis Tsirigos,et al.  A new computational method for the detection of horizontal gene transfer events , 2005, Nucleic acids research.

[51]  G. Gonnet,et al.  ALF—A Simulation Framework for Genome Evolution , 2011, Molecular biology and evolution.

[52]  D. Bryant Building trees, hunting for trees, and comparing trees : theory and methods in phylogenetic analysis , 1997 .

[53]  Pavel Skums,et al.  Efficient error correction for next-generation sequencing of viral amplicons , 2012, BMC Bioinformatics.

[54]  David Fernández-Baca,et al.  An ILP solution for the gene duplication problem , 2011, BMC Bioinformatics.

[55]  Ingo Ebersberger,et al.  HaMStR: Profile hidden markov model based search for orthologs in ESTs , 2009, BMC Evolutionary Biology.

[56]  Colin N. Dewey,et al.  BUCKy: Gene tree/species tree reconciliation with Bayesian concordance analysis , 2010, Bioinform..

[57]  Gaston H. Gonnet,et al.  OMA 2011: orthology inference among 1000 complete genomes , 2010, Nucleic Acids Res..

[58]  Charles Semple,et al.  Recovering a phylogenetic tree using pairwise closure operations , 2005, Appl. Math. Lett..

[59]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[60]  Leo van Iersel,et al.  Uniqueness, Intractability and Exact Algorithms: Reflections on Level-k Phylogenetic Networks , 2007, J. Bioinform. Comput. Biol..

[61]  Katharina T. Huber,et al.  From event-labeled gene trees to species trees , 2012, BMC Bioinformatics.

[62]  Christophe Dessimoz,et al.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods , 2009, PLoS Comput. Biol..

[63]  E. Koonin,et al.  Functional and evolutionary implications of gene orthology , 2013, Nature Reviews Genetics.