论文信息 - 5 MCMC Using Hamiltonian Dynamics

5 MCMC Using Hamiltonian Dynamics

Markov chain Monte Carlo (MCMC) originated with the classic paper of Metropolis et al. (1953), where it was used to simulate the distribution of states for a system of idealized molecules. Not long after, another approach to molecular simulation was introduced (Alder and Wainwright, 1959), in which the motion of the molecules was deterministic, following Newton’s laws of motion, which have an elegant formalization as Hamiltonian dynamics. For finding the properties of bulk materials, these approaches are asymptotically equivalent, since even in a deterministic simulation, each local region of the material experiences effectively random influences from distant regions. Despite the large overlap in their application areas, the MCMC and molecular dynamics approaches have continued to coexist in the following decades (see Frenkel and Smit, 1996). In 1987, a landmark paper by Duane, Kennedy, Pendleton, and Roweth united the MCMC and molecular dynamics approaches. They called their method “hybrid Monte Carlo,” which abbreviates to “HMC,” but the phrase “Hamiltonian Monte Carlo,” retaining the abbreviation, is more specific and descriptive, and I will use it here. Duane et al. applied HMC not to molecular simulation, but to lattice field theory simulations of quantum chromodynamics. Statistical applications of HMC began with my use of it for neural network models (Neal, 1996a). I also provided a statistically-oriented tutorial on HMC in a review of MCMC methods (Neal, 1993, Chapter 5). There have been other applications of HMC to statistical problems (e.g. Ishwaran, 1999; Schmidt, 2009) and statisticallyoriented reviews (e.g. Liu, 2001, Chapter 9), but HMC still seems to be underappreciated by statisticians, and perhaps also by physicists outside the lattice field theory community. This review begins by describing Hamiltonian dynamics. Despite terminology that may be unfamiliar outside physics, the features of Hamiltonian dynamics that are needed for HMC are elementary. The differential equations of Hamiltonian dynamics must be discretized for computer implementation. The “leapfrog” scheme that is typically used is quite simple. Following this introduction to Hamiltonian dynamics, I describe how to use it to construct an MCMC method. The first step is to define a Hamiltonian function in terms of the probability distribution we wish to sample from. In addition to the variables we are interested in (the “position” variables), we must introduce auxiliary “momentum” variables, which typically have independent Gaussian distributions. The HMC method alternates simple updates for these momentum variables with Metropolis updates in which a new state is proposed by computing a trajectory according to Hamiltonian dynamics, implemented with the leapfrog method. A state proposed in this way can be distant from the

Radford M. Neal

[1] Radford M. Neal. Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[2] Ben Calderhead,et al. Riemannian Manifold Hamiltonian Monte Carlo , 2009, 0907.1100.

[3] Mikkel N. Schmidt. Function factorization using warped Gaussian processes , 2009, ICML '09.

[4] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5] Radford M. Neal. Regression and Classification Using Gaussian Process Priors , 2009 .

[6] Christophe Andrieu,et al. A tutorial on adaptive MCMC , 2008, Stat. Comput..

[7] E. Hairer,et al. Simulating Hamiltonian dynamics , 2006, Math. Comput..

[8] David J. Earl,et al. Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[9] Radford M. Neal. The Short-Cut Metropolis Method , 2005, math/0508060.

[10] Scott S. Hampton,et al. Shadow hybrid Monte Carlo: an efficient propagator in phase space of macromolecules , 2004 .

[11] A. P. Dawid,et al. Gaussian Processes to Speed up Hybrid Monte Carlo for Expensive Bayesian Integrals , 2003 .

[12] Tim Hesterberg,et al. Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[13] Jun S. Liu,et al. Multipoint metropolis method with application to hybrid Monte Carlo , 2001 .

[14] A. Kennedy,et al. Cost of the Generalised Hybrid Monte Carlo Algorithm for Free Field Theory , 2000, hep-lat/0008020.

[15] Radford M. Neal. Annealed importance sampling , 1998, Stat. Comput..

[16] Kiam Choo. Learning hyperparameters for neural network models using Hamiltonian dynamics , 2000 .

[17] H. Ishwaran. Applications of Hybrid Monte Carlo to Bayesian Generalized Linear Models: Quasicomplete Separation and Neural Networks , 1999 .

[18] J. Rosenthal,et al. Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[19] A. Gelman,et al. Weak convergence and optimal scaling of random walk Metropolis algorithms , 1997 .

[20] Pal Rujan,et al. Playing Billiards in Version Space , 1997, Neural Computation.

[21] Radford M. Neal. Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..

[22] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[23] Berend Smit,et al. Understanding molecular simulation: from algorithms to applications , 1996 .

[24] Radford M. Neal. Bayesian learning for neural networks , 1995 .

[25] Michael I. Miller,et al. REPRESENTATIONS OF KNOWLEDGE IN COMPLEX SYSTEMS , 1994 .

[26] S. Caracciolo,et al. A general limitation on Monte Carlo algorithms of the Metropolis type. , 1993, Physical review letters.

[27] Radford M. Neal. An improved acceptance procedure for the hybrid Monte Carlo algorithm , 1992, hep-lat/9208011.

[28] G. Parisi,et al. Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[29] R. McLachlan,et al. The accuracy of symplectic integrators , 1992 .

[30] A. Horowitz. A generalized guided Monte Carlo algorithm , 1991 .

[31] A. Kennedy,et al. Acceptances and autocorrelations in hybrid Monte Carlo , 1991 .

[32] Sourendu Gupta,et al. The acceptance probability in the hybrid Monte Carlo method , 1990 .

[33] A. Kennedy. The theory of hybrid stochastic algorithms , 1990 .

[34] Paul B. Mackenze. An Improved Hybrid Monte Carlo Method , 1989 .

[35] Creutz,et al. Higher-order hybrid Monte Carlo algorithms. , 1989, Physical review letters.

[36] Creutz. Global Monte Carlo algorithms for many-fermion systems. , 1988, Physical review. D, Particles and fields.

[37] A. Kennedy,et al. Hybrid Monte Carlo , 1987 .

[38] J. Doll,et al. Brownian dynamics as smart Monte Carlo simulation , 1978 .

[39] D. W. Noid. Studies in Molecular Dynamics , 1976 .

[40] Charles H. Bennett,et al. Mass tensor molecular dynamics , 1975 .

[41] V. Arnold. Mathematical Methods of Classical Mechanics , 1974 .

[42] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[43] B. Alder,et al. Studies in Molecular Dynamics. I. General Method , 1959 .

[44] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.