Probabilistic Inference Using Markov Chain Monte Carlo Methods

Probabilistic inference is an attractive approach to uncertain reasoning and empirical learning in artificial intelligence. Computational difficulties arise, however, because probabilistic models with the necessary realism and flexibility lead to complex distributions over high-dimensional spaces. Related problems in other fields have been tackled using Monte Carlo methods based on sampling using Markov chains, providing a rich array of techniques that can be applied to problems in artificial intelligence. The “Metropolis algorithm” has been used to solve difficult problems in statistical physics for over forty years, and, in the last few years, the related method of “Gibbs sampling” has been applied to problems of statistical inference. Concurrently, an alternative method for solving problems in statistical physics by means of dynamical simulation has been developed as well, and has recently been unified with the Metropolis algorithm to produce the “hybrid Monte Carlo” method. In computer science, Markov chain sampling is the basis of the heuristic optimization technique of “simulated annealing”, and has recently been used in randomized algorithms for approximate counting of large sets. In this review, I outline the role of probabilistic inference in artificial intelligence, present the theory of Markov chains, and describe various Markov chain Monte Carlo algorithms, along with a number of supporting techniques. I try to present a comprehensive picture of the range of methods that have been developed, including techniques from the varied literature that have not yet seen wide application in artificial intelligence, but which appear relevant. As illustrative examples, I use the problems of probabilistic inference in expert systems, discovery of latent classes from data, and Bayesian learning for neural networks.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  F. R. Parker,et al.  Monte Carlo Equation of State of Molecules Interacting with the Lennard‐Jones Potential. I. A Supercritical Isotherm at about Twice the Critical Temperature , 1957 .

[3]  B. Alder,et al.  Studies in Molecular Dynamics. I. General Method , 1959 .

[4]  Samuel A. Schmitt Measuring Uncertainty: An Elementary Introduction to Bayesian Statistics , 1969 .

[5]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[6]  R. Friedberg,et al.  Test of the Monte Carlo Method: Fast Simulation of a Small Ising Lattice , 1970 .

[7]  Z. Alexandrowicz Stochastic Models for the Statistical Description of Lattice Systems , 1971 .

[8]  P. Peskun,et al.  Optimum Monte-Carlo sampling using Markov chains , 1973 .

[9]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[10]  G. W. Cunningham,et al.  A comparison of two Monte Carlo methods for computations in statistical mechanics , 1974 .

[11]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[12]  Charles H. Bennett,et al.  Mass tensor molecular dynamics , 1975 .

[13]  A. B. Bortz,et al.  A new algorithm for Monte Carlo simulation of Ising spin systems , 1975 .

[14]  Charles H. Bennett,et al.  Efficient estimation of free energy differences from Monte Carlo data , 1976 .

[15]  D. W. Noid Studies in Molecular Dynamics , 1976 .

[16]  J. Valleau,et al.  A Guide to Monte Carlo for Statistical Mechanics: 2. Byways , 1977 .

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  G. Torrie,et al.  Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling , 1977 .

[19]  J. Doll,et al.  Brownian dynamics as smart Monte Carlo simulation , 1978 .

[20]  M. Rao,et al.  On the force bias Monte Carlo simulation of water: methodology, optimization and comparison with molecular dynamics , 1979 .

[21]  B. Efron Computers and the Theory of Statistics: Thinking the Unthinkable , 1979 .

[22]  M. Rao,et al.  On the force bias Monte Carlo simulation of simple liquids , 1979 .

[23]  Averill M. Law Statistical Analysis of Simulation Output Data , 1980 .

[24]  H. C. Andersen Molecular dynamics simulations at constant pressure and/or temperature , 1980 .

[25]  R. Gordon Chapter 5 – Monte Carlo Methods for Cooperative ISING Models* , 1980 .

[26]  J. Kingman FINITE MARKOV PROCESSES AND THEIR APPLICATIONS , 1981 .

[27]  P. Peskun Guidelines for choosing the transition matrix in Monte Carlo methods using Markov chains , 1981 .

[28]  S. Adler Over-relaxation method for the Monte Carlo evaluation of the partition function for multiquadratic actions , 1981 .

[29]  M. Mezei On the Selection of the Particle to Be Perturbed in the Monte Carlo Method , 1981 .

[30]  Philip Heidelberger,et al.  A spectral method for confidence interval generation and run length control in simulations , 1981, CACM.

[31]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[32]  R. Ruth A Can0nical Integrati0n Technique , 1983, IEEE Transactions on Nuclear Science.

[33]  S. Nosé A unified formulation of the constant temperature molecular dynamics methods , 1984 .

[34]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[35]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Kennedy,et al.  Noise without noise: A new Monte Carlo method. , 1985, Physical review letters.

[37]  A. Voter A Monte Carlo method for determining free‐energy differences and transition state theory rate constants , 1985 .

[38]  B. Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[39]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[40]  A. Kennedy,et al.  Bosonic lattice gauge theory with noise , 1985 .

[41]  W. W. Wood Early history of computer simulations in statistical mechanics , 1985 .

[42]  V. Cerný Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[43]  Duane,et al.  Hybrid stochastic differential equations applied to quantum chromodynamics. , 1985, Physical review letters.

[44]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[45]  D. Heermann Computer Simulation Methods in Theoretical Physics , 1986 .

[46]  A. J. Stam,et al.  Estimation of statistical errors in molecular simulation calculations , 1986 .

[47]  D. Frenkel Free-energy computation and first-order phase transitions , 1986 .

[48]  Kurt Binder,et al.  Introduction: Theory and “Technical” Aspects of Monte Carlo Simulations , 1986 .

[49]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[50]  Kenneth J. Supowit,et al.  Simulated Annealing Without Rejected Moves , 1986, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[51]  F. Abraham,et al.  Computational statistical mechanics methodology, applications and supercomputing , 1986 .

[52]  P. A. Bash,et al.  Free energy calculations by computer simulation. , 1987, Science.

[53]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[54]  B. Cipra An introduction to the Ising model , 1987 .

[55]  Mark Jerrum,et al.  Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains , 1987, WG.

[56]  Wang,et al.  Nonuniversal critical dynamics in Monte Carlo simulations. , 1987, Physical review letters.

[57]  Judea Pearl,et al.  Evidential Reasoning Using Stochastic Simulation of Causal Models , 1987, Artif. Intell..

[58]  D. C. Rapaport,et al.  Book review:Monte Carlo methods. Volume I: Basics , 1987 .

[59]  M. Mezei Adaptive umbrella sampling: Self-consistent determination of the non-Boltzmann bias , 1987 .

[60]  D. Frenkel,et al.  Simulation of liquids and solids : molecular dynamics and Monte Carlo methods in statistical mechanics , 1987 .

[61]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[62]  A. O'Hagan,et al.  The Calculation of Posterior Distributions by Data Augmentation: Comment , 1987 .

[63]  Adrian F. M. Smith,et al.  Bayesian computation via the gibbs sampler and related markov chain monte carlo methods (with discus , 1993 .

[64]  A. Sokal,et al.  Bounds on the ² spectrum for Markov chains and Markov processes: a generalization of Cheeger’s inequality , 1988 .

[65]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[66]  A. Sokal,et al.  Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm. , 1988, Physical review. D, Particles and fields.

[67]  P. Salamon,et al.  Simulated annealing with constant thermodynamic speed , 1988 .

[68]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[69]  Creutz Global Monte Carlo algorithms for many-fermion systems. , 1988, Physical review. D, Particles and fields.

[70]  Peter C. Cheeseman,et al.  An inquiry into computer understanding , 1988, Comput. Intell..

[71]  C. Thompson Classical Equilibrium Statistical Mechanics , 1988 .

[72]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[73]  Bruce E. Hajek,et al.  Cooling Schedules for Optimal Annealing , 1988, Math. Oper. Res..

[74]  Milena Mihail,et al.  Conductance and convergence of Markov chains-a combinatorial treatment of expanders , 1989, 30th Annual Symposium on Foundations of Computer Science.

[75]  30th Annual Symposium on Foundations of Computer Science, Research Triangle Park, North Carolina, USA, 30 October - 1 November 1989 , 1989, FOCS.

[76]  Goodman,et al.  Multigrid Monte Carlo method. Conceptual foundations. , 1989, Physical review. D, Particles and fields.

[77]  Mark Jerrum,et al.  Approximating the Permanent , 1989, SIAM J. Comput..

[78]  Paul B. Mackenze An Improved Hybrid Monte Carlo Method , 1989 .

[79]  D. Toussaint Introduction to algorithms for Monte Carlo simulations and their application to QCD , 1989 .

[80]  Mark Jerrum,et al.  Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains , 1987, International Workshop on Graph-Theoretic Concepts in Computer Science.

[81]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[82]  L. Ingber Very fast simulated re-annealing , 1989 .

[83]  Creutz,et al.  Higher-order hybrid Monte Carlo algorithms. , 1989, Physical review letters.

[84]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[85]  A. Kennedy The theory of hybrid stochastic algorithms , 1990 .

[86]  T. Fearn,et al.  Bayesian statistics : principles, models, and applications , 1990 .

[87]  D. L. Freeman,et al.  Reducing Quasi-Ergodic Behavior in Monte Carlo Simulations by J-Walking: Applications to Atomic Clusters , 1990 .

[88]  P. Diggle Time Series: A Biostatistical Introduction , 1990 .

[89]  Igor I. Sheykhet,et al.  Monte Carlo method in the theory of solutions , 1990 .

[90]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[91]  J. Morales,et al.  Statistical error method in computer simulations , 1990 .

[92]  Melanie Mitchell,et al.  The emergence of understanding in a computer model of concepts analogy-making , 1990 .

[93]  F. Guess Bayesian Statistics: Principles, Models, and Applications , 1990 .

[94]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[95]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[96]  A. Horowitz A generalized guided Monte Carlo algorithm , 1991 .

[97]  John Geweke,et al.  Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments , 1991 .

[98]  D. Frenkel Free-energy calculations , 1991 .

[99]  Robin Hanson,et al.  Bayesian Classification with Correlation and Inheritance , 1991, IJCAI.

[100]  Eugene Charniak,et al.  Bayesian Networks without Tears , 1991, AI Mag..

[101]  John K. Goutsias Unilateral approximation of Gibbs random field images , 1991, CVGIP Graph. Model. Image Process..

[102]  Adrian F. M. Smith,et al.  Efficient generation of random variates via the ratio-of-uniforms method , 1991 .

[103]  H. Wozniakowski Average case complexity of multivariate integration , 1991 .

[104]  P. Diaconis,et al.  Geometric Bounds for Eigenvalues of Markov Chains , 1991 .

[105]  Robert L. Smith,et al.  Shake-and-Bake Algorithms for Generating Uniform Points on the Boundary of Bounded Polyhedra , 1991, Oper. Res..

[106]  A. Kennedy,et al.  Acceptances and autocorrelations in hybrid Monte Carlo , 1991 .

[107]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[108]  J. A. Fill Eigenvalue bounds on convergence to stationarity for nonreversible markov chains , 1991 .

[109]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[110]  J. Sexton,et al.  Hamiltonian evolution for the hybrid Monte Carlo algorithm , 1992 .

[111]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[112]  J. Sexton,et al.  Hamiltonian evolution for the hybrid Monte Carlo algorithm , 1992 .

[113]  R. M. Oliver,et al.  Influence diagrams, belief nets and decision analysis , 1992 .

[114]  James O. Berger,et al.  Ockham's Razor and Bayesian Analysis , 1992 .

[115]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[116]  D. Earn,et al.  Exact numerical studies of Hamiltonian maps: iterating without roundoff error , 1992 .

[117]  W. Gilks,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 1992 .

[118]  Alan M. Ferrenberg,et al.  New Monte Carlo Methods for Improved Efficiency of Computer Simulations in Statistical Mechanics , 1992 .

[119]  D. Heermann,et al.  Parallel Algorithms for Statistical Physics Problems , 1992 .

[120]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[121]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[122]  M. Tanner,et al.  Facilitating the Gibbs Sampler: The Gibbs Stopper and the Griddy-Gibbs Sampler , 1992 .

[123]  J. M. Sanz-Serna,et al.  Symplectic integrators for Hamiltonian problems: an overview , 1992, Acta Numerica.

[124]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[125]  Berg,et al.  New approach to spin-glass simulations. , 1992, Physical review letters.

[126]  M. West,et al.  A Bayesian method for classification and discrimination , 1992 .

[127]  Radford M. Neal Bayesian Mixture Modeling , 1992 .

[128]  Radford M. Neal Bayesian Learning via Stochastic Dynamics , 1992, NIPS.

[129]  L. Tierney Exploring Posterior Distributions Using Markov Chains , 1992 .

[130]  Radford M. Neal An improved acceptance procedure for the hybrid Monte Carlo algorithm , 1992, hep-lat/9208011.

[131]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[132]  Radford M. Neal Bayesian training of backpropagation networks by the hybrid Monte-Carlo method , 1992 .

[133]  Jeremy York,et al.  Use of the Gibbs Sampler in Expert Systems , 1992, Artif. Intell..

[134]  D. Aldous Approximate Counting via Markov Chains , 1993 .

[135]  Alistair Sinclair,et al.  Algorithms for Random Generation and Counting: A Markov Chain Approach , 1993, Progress in Theoretical Computer Science.

[136]  R. Martin Chavez,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[137]  T. Watkin,et al.  THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[138]  Martin Abba Tanner,et al.  Tools for Statistical Inference: Observed Data and Data Augmentation Methods , 1993 .

[139]  Robert L. Smith,et al.  Hit-and-Run Algorithms for Generating Multivariate Distributions , 1993, Math. Oper. Res..

[140]  J. Besag,et al.  Spatial Statistics and Bayesian Computation , 1993 .

[141]  C. Hwang,et al.  Convergence rates of the Gibbs sampler, the Metropolis algorithm and other single-site updating dynamics , 1993 .

[142]  Bradley P. Carlin,et al.  Tools for Statistical Inference: Observed Data and Data Augmentation Methods (Martin A. Tanner) , 1993, SIAM Rev..

[143]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[144]  H. H. Thodberg Ace of Bayes : Application of Neural , 1993 .

[145]  Peter Green,et al.  Spatial statistics and Bayesian computation (with discussion) , 1993 .

[146]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[147]  G. Aeppli,et al.  Proceedings of the International School of Physics Enrico Fermi , 1994 .

[148]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[149]  Joong-Kweon Sohn,et al.  Convergence Diagnostics for the Gibbs Sampler , 1996 .

[150]  Mike Mannion,et al.  Complex systems , 1997, Proceedings International Conference and Workshop on Engineering of Computer-Based Systems.

[151]  Klaus Ritter,et al.  Bayesian numerical analysis , 2000 .

[152]  Christian Van den Broeck,et al.  Statistical Mechanics of Learning , 2001 .

[153]  Richard Szeliski,et al.  Bayesian modeling of uncertainty in low-level vision , 2011, International Journal of Computer Vision.

[154]  Christian P. Robert,et al.  Bayesian computational methods , 2010, 1002.2702.

[155]  John R. Anderson,et al.  Explorations of an Incremental, Bayesian Algorithm for Categorization , 1992, Machine Learning.

[156]  W. Hörmann,et al.  Monte Carlo Integration Using Importance Sampling and Gibbs Sampling , 2005 .

[157]  A. Gelman Iterative and Non-iterative Simulation Algorithms , 2006 .

[158]  H. Szu Fast simulated annealing , 1987 .

[159]  By W. R. GILKSt,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 2010 .

[160]  Peter Rossmanith,et al.  Simulated Annealing , 2008, Taschenbuch der Algorithmen.

[161]  Peter Cheeseman,et al.  Bayesian Methods for Adaptive Models , 2011 .

[162]  Kathryn A. Dowsland,et al.  Simulated Annealing , 1989, Handbook of Natural Computing.