5 MCMC Using Hamiltonian Dynamics

Markov chain Monte Carlo (MCMC) originated with the classic paper of Metropolis et al. (1953), where it was used to simulate the distribution of states for a system of idealized molecules. Not long after, another approach to molecular simulation was introduced (Alder and Wainwright, 1959), in which the motion of the molecules was deterministic, following Newton’s laws of motion, which have an elegant formalization as Hamiltonian dynamics. For finding the properties of bulk materials, these approaches are asymptotically equivalent, since even in a deterministic simulation, each local region of the material experiences effectively random influences from distant regions. Despite the large overlap in their application areas, the MCMC and molecular dynamics approaches have continued to coexist in the following decades (see Frenkel and Smit, 1996). In 1987, a landmark paper by Duane, Kennedy, Pendleton, and Roweth united the MCMC and molecular dynamics approaches. They called their method “hybrid Monte Carlo,” which abbreviates to “HMC,” but the phrase “Hamiltonian Monte Carlo,” retaining the abbreviation, is more specific and descriptive, and I will use it here. Duane et al. applied HMC not to molecular simulation, but to lattice field theory simulations of quantum chromodynamics. Statistical applications of HMC began with my use of it for neural network models (Neal, 1996a). I also provided a statistically-oriented tutorial on HMC in a review of MCMC methods (Neal, 1993, Chapter 5). There have been other applications of HMC to statistical problems (e.g. Ishwaran, 1999; Schmidt, 2009) and statisticallyoriented reviews (e.g. Liu, 2001, Chapter 9), but HMC still seems to be underappreciated by statisticians, and perhaps also by physicists outside the lattice field theory community. This review begins by describing Hamiltonian dynamics. Despite terminology that may be unfamiliar outside physics, the features of Hamiltonian dynamics that are needed for HMC are elementary. The differential equations of Hamiltonian dynamics must be discretized for computer implementation. The “leapfrog” scheme that is typically used is quite simple. Following this introduction to Hamiltonian dynamics, I describe how to use it to construct an MCMC method. The first step is to define a Hamiltonian function in terms of the probability distribution we wish to sample from. In addition to the variables we are interested in (the “position” variables), we must introduce auxiliary “momentum” variables, which typically have independent Gaussian distributions. The HMC method alternates simple updates for these momentum variables with Metropolis updates in which a new state is proposed by computing a trajectory according to Hamiltonian dynamics, implemented with the leapfrog method. A state proposed in this way can be distant from the

[1]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[2]  Ben Calderhead,et al.  Riemannian Manifold Hamiltonian Monte Carlo , 2009, 0907.1100.

[3]  Mikkel N. Schmidt Function factorization using warped Gaussian processes , 2009, ICML '09.

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  Radford M. Neal Regression and Classification Using Gaussian Process Priors , 2009 .

[6]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[7]  E. Hairer,et al.  Simulating Hamiltonian dynamics , 2006, Math. Comput..

[8]  David J. Earl,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[9]  Radford M. Neal The Short-Cut Metropolis Method , 2005, math/0508060.

[10]  Scott S. Hampton,et al.  Shadow hybrid Monte Carlo: an efficient propagator in phase space of macromolecules , 2004 .

[11]  A. P. Dawid,et al.  Gaussian Processes to Speed up Hybrid Monte Carlo for Expensive Bayesian Integrals , 2003 .

[12]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[13]  Jun S. Liu,et al.  Multipoint metropolis method with application to hybrid Monte Carlo , 2001 .

[14]  A. Kennedy,et al.  Cost of the Generalised Hybrid Monte Carlo Algorithm for Free Field Theory , 2000, hep-lat/0008020.

[15]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[16]  Kiam Choo Learning hyperparameters for neural network models using Hamiltonian dynamics , 2000 .

[17]  H. Ishwaran Applications of Hybrid Monte Carlo to Bayesian Generalized Linear Models: Quasicomplete Separation and Neural Networks , 1999 .

[18]  J. Rosenthal,et al.  Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[19]  A. Gelman,et al.  Weak convergence and optimal scaling of random walk Metropolis algorithms , 1997 .

[20]  Pal Rujan,et al.  Playing Billiards in Version Space , 1997, Neural Computation.

[21]  Radford M. Neal Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..

[22]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[23]  Berend Smit,et al.  Understanding molecular simulation: from algorithms to applications , 1996 .

[24]  Radford M. Neal Bayesian learning for neural networks , 1995 .

[25]  Michael I. Miller,et al.  REPRESENTATIONS OF KNOWLEDGE IN COMPLEX SYSTEMS , 1994 .

[26]  S. Caracciolo,et al.  A general limitation on Monte Carlo algorithms of the Metropolis type. , 1993, Physical review letters.

[27]  Radford M. Neal An improved acceptance procedure for the hybrid Monte Carlo algorithm , 1992, hep-lat/9208011.

[28]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[29]  R. McLachlan,et al.  The accuracy of symplectic integrators , 1992 .

[30]  A. Horowitz A generalized guided Monte Carlo algorithm , 1991 .

[31]  A. Kennedy,et al.  Acceptances and autocorrelations in hybrid Monte Carlo , 1991 .

[32]  Sourendu Gupta,et al.  The acceptance probability in the hybrid Monte Carlo method , 1990 .

[33]  A. Kennedy The theory of hybrid stochastic algorithms , 1990 .

[34]  Paul B. Mackenze An Improved Hybrid Monte Carlo Method , 1989 .

[35]  Creutz,et al.  Higher-order hybrid Monte Carlo algorithms. , 1989, Physical review letters.

[36]  Creutz Global Monte Carlo algorithms for many-fermion systems. , 1988, Physical review. D, Particles and fields.

[37]  A. Kennedy,et al.  Hybrid Monte Carlo , 1987 .

[38]  J. Doll,et al.  Brownian dynamics as smart Monte Carlo simulation , 1978 .

[39]  D. W. Noid Studies in Molecular Dynamics , 1976 .

[40]  Charles H. Bennett,et al.  Mass tensor molecular dynamics , 1975 .

[41]  V. Arnold Mathematical Methods of Classical Mechanics , 1974 .

[42]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[43]  B. Alder,et al.  Studies in Molecular Dynamics. I. General Method , 1959 .

[44]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.