PHYLOGENETIC INFERENCE VIA SEQUENTIAL MONTE CARLO Phylogenetic Inference via Sequential Monte Carlo

—Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to Bayesian inference based on Markov Chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of Bayesian methods. In this paper we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets, and show how to apply this framework—which we refer to as PosetSMC—to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/∼bouchard/PosetSMC. (

[1]  P. Moral,et al.  On adaptive resampling strategies for sequential Monte Carlo methods , 2012, 1203.0464.

[2]  Joshua S. Paul,et al.  An Accurate Sequentially Markov Conditional Sampling Distribution for the Coalescent With Recombination , 2011, Genetics.

[3]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[4]  Ming-Hui Chen,et al.  Choosing among Partition Models in Bayesian Phylogenetics , 2010, Molecular biology and evolution.

[5]  Marc A Suchard,et al.  Reuse, Recycle, Reweigh: Combating Influenza through Efficient Sequential Bayesian Computation for Massive Data. , 2010, The annals of applied statistics.

[6]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[7]  Marc A. Suchard,et al.  Many-core algorithms for statistical phylogenetics , 2009, Bioinform..

[8]  Yee Whye Teh,et al.  An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering , 2008, NIPS.

[9]  M. Feldman,et al.  Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation , 2008 .

[10]  A. Doucet,et al.  A Tutorial on Particle Filtering and Smoothing: Fifteen years later , 2008 .

[11]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[12]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[13]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[14]  H. Philippe,et al.  Computing Bayes factors using thermodynamic integration. , 2006, Systematic biology.

[15]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[16]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[17]  R. Douc,et al.  Limit theorems for weighted samples with applications to sequential Monte Carlo methods , 2005, math/0507042.

[18]  M. Suchard,et al.  Joint Bayesian estimation of alignment and phylogeny. , 2005, Systematic biology.

[19]  Thomas M. Keane,et al.  DPRml: distributed phylogeny reconstruction by maximum likelihood , 2005, Bioinform..

[20]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[21]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[22]  Elena Rivas,et al.  Evolutionary models for insertions and deletions in a probabilistic modeling framework , 2005, BMC Bioinformatics.

[23]  M. De Iorio,et al.  Importance sampling on coalescent histories. I , 2004, Advances in Applied Probability.

[24]  D. Haussler,et al.  Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. , 2003, Molecular biology and evolution.

[25]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[26]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Xizhou Feng,et al.  Parallel algorithms for Bayesian phylogenetic inference , 2003, J. Parallel Distributed Comput..

[28]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[29]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[30]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[31]  A. Doucet,et al.  A survey of convergence results on particle ltering for practitioners , 2002 .

[32]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[33]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[34]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[35]  J. Huelsenbeck,et al.  A compound poisson process for relaxing the molecular clock. , 2000, Genetics.

[36]  Jun S. Liu,et al.  The Multiple-Try Method and Local Optimization in Metropolis Sampling , 2000 .

[37]  P. Fearnhead,et al.  An improved particle filter for non-linear problems , 1999 .

[38]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[39]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[40]  Radford M. Neal Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..

[41]  Robert C. Griffiths,et al.  Monte Carlo inference methods in population genetics , 1996 .

[42]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[43]  Brian Jefferies Feynman-Kac Formulae , 1996 .

[44]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[45]  Jun S. Liu,et al.  Sequential Imputations and Bayesian Missing Data Problems , 1994 .

[46]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[47]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[48]  Wang,et al.  Replica Monte Carlo simulation of spin glasses. , 1986, Physical review letters.

[49]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[50]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[51]  J. Felsenstein Maximum-likelihood estimation of evolutionary trees from continuous characters. , 1973, American journal of human genetics.

[52]  J. Doob Markoff chains—denumerable case , 1945 .