Phylogenetic Parameter Estimation on COWs

Phylogenetic analysis is a routine task in biological research. The growing amounts of biological sequence data makes the computational part a bottleneck of the analysis of these data. Parallel computing is applied to reduce the computational burden. In this chapter we discuss different factors that influences the performance of parallel implementations. Using the example of parameter estimation in the TREEPUZZLE program, we analyze the performance and speedup of different scheduling algorithms on two different kinds of workstation clusters, which are the most abundant parallel platform in biological research. To that end different parts of the TREEPUZZLE program with diverse parallel complexity are examined and the impact of their characteristics are discussed. In addition, an extended parallelization for the parameter estimation program is introduced.

[1]  J. Wakeley Substitution rate variation among sites in hypervariable region 1 of human mitochondrial DNA , 1993, Journal of Molecular Evolution.

[2]  Guy Perrière,et al.  The European ribosomal RNA database , 2004, Nucleic Acids Res..

[3]  A. von Haeseler,et al.  Distance measures in terms of substitution processes. , 1999, Theoretical population biology.

[4]  Thomas Uzzell,et al.  Fitting Discrete Probability Distributions to Evolutionary Events , 1971, Science.

[5]  K. Strimmer,et al.  Bayesian Probabilities and Quartet Puzzling , 1997 .

[6]  Heiko A. Schmidt,et al.  Phylogenetic trees from large datasets , 2003 .

[7]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[8]  Michael J. Quinn,et al.  Parallel programming in C with MPI and OpenMP , 2003 .

[9]  David Sankoff,et al.  COMPUTATIONAL COMPLEXITY OF INFERRING PHYLOGENIES BY COMPATIBILITY , 1986 .

[10]  M. Gouy,et al.  HOVERGEN: a database of homologous vertebrate genes. , 1994, Nucleic acids research.

[11]  K. Holsinger,et al.  The effect of topology on estimates of among-site rate variation , 1996, Journal of Molecular Evolution.

[12]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[13]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[14]  Ziheng Yang,et al.  Maximum-likelihood models for combined analyses of multiple sequence data , 1996, Journal of Molecular Evolution.

[15]  A. Zharkikh Estimation of evolutionary distances between nucleotide sequences , 1994, Journal of Molecular Evolution.

[16]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[17]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[18]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[19]  Martin Vingron,et al.  Molecular phylogenetics: parallelized parameter estimation and quartet puzzling , 2003, J. Parallel Distributed Comput..

[20]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[21]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[22]  Michael R. Fellows,et al.  Two Strikes Against Perfect Phylogeny , 1992, ICALP.

[23]  J. Adachi,et al.  MOLPHY version 2.3 : programs for molecular phylogenetics based on maximum likelihood , 1996 .

[24]  W. Li,et al.  A general additive distance with time-reversibility and rate variation among nucleotide sites. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[25]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[26]  Ziheng Yang Estimating the pattern of nucleotide substitution , 1994, Journal of Molecular Evolution.

[27]  Seymour E. Goodman,et al.  Introduction to the Design and Analysis of Algorithms , 1977 .

[28]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[29]  Torben Hagerup Allocating Independent Tasks to Parallel Processors: An Experimental Study , 1996, IRREGULAR.

[30]  Oswaldo Trelles,et al.  On the Parallelisation of Bioinformatics Applications , 2001, Briefings Bioinform..