An Introduction to MCMC for Machine Learning

This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of this special issue. Lastly, it discusses new interesting research horizons.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[4]  P. Peskun,et al.  Optimum Monte-Carlo sampling using Markov chains , 1973 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[7]  R. Baxter Exactly solved models in statistical mechanics , 1982 .

[8]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[9]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[12]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[13]  Wang,et al.  Nonuniversal critical dynamics in Monte Carlo simulations. , 1987, Physical review letters.

[14]  Judea Pearl,et al.  Evidential Reasoning Using Stochastic Simulation of Causal Models , 1987, Artif. Intell..

[15]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[16]  C. Bucher Adaptive sampling — an iterative fast Monte Carlo procedure , 1988 .

[17]  D. Rubin Using the SIR algorithm to simulate posterior distributions , 1988 .

[18]  Martin E. Dyer,et al.  A random polynomial-time algorithm for approximating the volume of convex bodies , 1991, JACM.

[19]  J. Geweke,et al.  Bayesian Inference in Econometric Models Using Monte Carlo Integration , 1989 .

[20]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[21]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[22]  Adrian F. M. Smith,et al.  Efficient generation of random variates via the ratio-of-uniforms method , 1991 .

[23]  H. Haario,et al.  Simulated annealing process in general state space , 1991, Advances in Applied Probability.

[24]  David Applegate,et al.  Sampling and integration of near log-concave functions , 1991, STOC '91.

[25]  G. Celeux,et al.  A stochastic approximation type EM algorithm for the mixture problem , 1992 .

[26]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[27]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[28]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[29]  Walter R. Gilks,et al.  A Language and Program for Complex Bayesian Modelling , 1994 .

[30]  R. Kohn,et al.  On Gibbs sampling for state space models , 1994 .

[31]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[32]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[33]  C. McCulloch Maximum Likelihood Variance Components Estimation for Binary Data , 1994 .

[34]  A. Gelfand,et al.  On Markov Chain Monte Carlo Acceleration , 1994 .

[35]  Uffe Kjærulff,et al.  Blocking Gibbs sampling in very large probabilistic expert systems , 1995, Int. J. Hum. Comput. Stud..

[36]  Stuart J. Russell,et al.  Stochastic simulation algorithms for dynamic probabilistic networks , 1995, UAI.

[37]  N. Shephard,et al.  The simulation smoother for time series models , 1995 .

[38]  Persi Diaconis,et al.  What do we know about the Metropolis algorithm? , 1995, STOC '95.

[39]  J. Besag,et al.  Bayesian Computation and Stochastic Systems , 1995 .

[40]  Michael Devetsikiotis,et al.  Stochastic gradient optimization of importance sampling for the efficient simulation of digital communication systems , 1995, IEEE Trans. Commun..

[41]  B. Carlin,et al.  Bayesian Model Choice Via Markov Chain Monte Carlo Methods , 1995 .

[42]  Bin Yu,et al.  Regeneration in Markov chain samplers , 1995 .

[43]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[44]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[45]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[46]  Walter R. Gilks,et al.  Strategies for improving MCMC , 1995 .

[47]  R. Tweedie,et al.  Rates of convergence of the Hastings and Metropolis algorithms , 1996 .

[48]  R. Tweedie,et al.  Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms , 1996 .

[49]  Mark Jerrum,et al.  The Markov chain Monte Carlo method: an approach to approximate counting and integration , 1996 .

[50]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[51]  Anil C. Kokaram,et al.  A sampling based approach to line scratch removal from motion picture frames , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[52]  Ravi Kannan,et al.  Sampling according to the multivariate normal density , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[53]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.

[54]  David Barber,et al.  Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo , 1996, NIPS.

[55]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[56]  G. Casella,et al.  Rao-Blackwellisation of sampling schemes , 1996 .

[57]  James Allen Fill,et al.  An interruptible algorithm for perfect sampling via Markov chains , 1997, STOC '97.

[58]  Leonidas J. Guibas,et al.  Metropolis light transport , 1997, SIGGRAPH.

[59]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[60]  D. Lipman,et al.  Extracting protein alignment models from the sequence database. , 1997, Nucleic acids research.

[61]  P. Saama MAXIMUM LIKELIHOOD AND BAYESIAN METHODS FOR MIXTURES OF NORMAL DISTRIBUTIONS , 1997 .

[62]  David Wilson,et al.  Coupling from the past: A user's guide , 1997, Microsurveys in Discrete Probability.

[63]  Mansoor Shafi,et al.  Quick Simulation: A Review of Importance Sampling Techniques in Communications Systems , 1997, IEEE J. Sel. Areas Commun..

[64]  Simon J. Godsill,et al.  On sequential simulation-based methods for Bayesian filtering , 1998 .

[65]  C. C. Homes,et al.  Bayesian Radial Basis Functions of Variable Dimension , 1998, Neural Computation.

[66]  Peter Müller,et al.  Feedforward Neural Networks for Nonparametric Regression , 1998 .

[67]  D. Higdon Auxiliary Variable Methods for Markov Chain Monte Carlo with Applications , 1998 .

[68]  Adrian F. M. Smith,et al.  A Bayesian CART algorithm , 1998 .

[69]  A. Doucet,et al.  Joint Bayesian detection and estimation of noisy sinusoids via reversible jump MCMC , 1998 .

[70]  Peter Müller,et al.  Issues in Bayesian Analysis of Neural Network Models , 1998, Neural Computation.

[71]  Simon J. Godsill,et al.  A reversible jump sampler for autoregressive time series , 1997, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[72]  Antonietta Mira,et al.  Ordering, Slicing And Splitting Monte Carlo Markov Chains , 1998 .

[73]  Stephen P. Brooks,et al.  Markov chain Monte Carlo method and its application , 1998 .

[74]  Sally Wood,et al.  A Bayesian Approach to Robust Binary Nonparametric Regression , 1998 .

[75]  D. Aldous,et al.  Microsurveys in Discrete Probability , 1998 .

[76]  G. Roberts,et al.  Adaptive Markov Chain Monte Carlo through Regeneration , 1998 .

[77]  Peter J. W. Rayner,et al.  Digital Audio Restoration: A Statistical Model Based Approach , 1998 .

[78]  C. Bielza,et al.  Decision Analysis by Augmented Probability Simulation , 1999 .

[79]  R. Sherman,et al.  Conditions for convergence of Monte Carlo EM sequences with an application to product diffusion modeling , 1999 .

[80]  L. A. Breyerz,et al.  Convergence of Simulated Annealing Using Foster-lyapunov Criteria , 1999 .

[81]  Christophe Andrieu,et al.  Robust Full Bayesian Methods for Neural Networks , 1999, NIPS.

[82]  H. Ishwaran Applications of Hybrid Monte Carlo to Bayesian Generalized Linear Models: Quasicomplete Separation and Neural Networks , 1999 .

[83]  Simon J. Godsill,et al.  Non-stationary Bayesian modelling and enhancement of speech signals , 1999 .

[84]  G. Casella,et al.  Perfect Slice Samplers for Mixtures of Distributions , 1999 .

[85]  Jun S. Liu,et al.  Sequential importance sampling for nonparametric Bayes models: The next generation , 1999 .

[86]  M. Pitt,et al.  Filtering via Simulation: Auxiliary Particle Filters , 1999 .

[87]  P. Damlen,et al.  Gibbs sampling for Bayesian non‐conjugate and hierarchical models by using auxiliary variables , 1999 .

[88]  L Tierney,et al.  Some adaptive monte carlo methods for Bayesian inference. , 1999, Statistics in medicine.

[89]  Niclas Bergman,et al.  Recursive Bayesian Estimation : Navigation and Tracking Applications , 1999 .

[90]  Christophe Andrieu,et al.  Joint Bayesian model selection and estimation of noisy sinusoids via reversible jump MCMC , 1999, IEEE Trans. Signal Process..

[91]  David A. Forsyth,et al.  Sampling, resampling and colour constancy , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[92]  Yaacov Ritov,et al.  Tracking Many Objects with Many Sensors , 1999, IJCAI.

[93]  A. Doucet,et al.  Sequential MCMC for Bayesian model selection , 1999, Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics. SPW-HOS '99.

[94]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[95]  Eric R. Ziegel,et al.  Practical Nonparametric and Semiparametric Bayesian Statistics , 1998, Technometrics.

[96]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[97]  Anthony Quinn,et al.  A data-driven Bayesian sampling scheme for unsupervised image segmentation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[98]  Thomas S. Richardson,et al.  Generalization of boosting algorithms and applications of bayesian inference for massive datasets , 1999 .

[99]  D. Dittmar Slice Sampling , 2000 .

[100]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[101]  Jian Cheng,et al.  AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks , 2000, J. Artif. Intell. Res..

[102]  Dale Schuurmans,et al.  Monte Carlo inference via greedy importance sampling , 2000, UAI.

[103]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[104]  Nando de Freitas,et al.  The Unscented Particle Filter , 2000, NIPS.

[105]  Steve Chien,et al.  Approximating Aggregate Queries about Web Pages via Random Walks , 2000, VLDB.

[106]  David A. Forsyth,et al.  Sampling plausible solutions to multi-body constraint problems , 2000, SIGGRAPH.

[107]  William J. Browne,et al.  Implementation and performance issues in the Bayesian and likelihood fitting of multilevel models , 2000, Comput. Stat..

[108]  Leslie Pack Kaelbling,et al.  Adaptive Importance Sampling for Estimation in Structured Domains , 2000, UAI.

[109]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[110]  Mike West,et al.  Bayesian Regression Analysis in the "Large p, Small n" Paradigm with Application in DNA Microarray S , 2000 .

[111]  M. Newton,et al.  Inferring the Location and Effect of Tumor Suppressor Genes by Instability‐Selection Modeling of Allelic‐Loss Data , 2000, Biometrics.

[112]  Arnaud Doucet,et al.  Sequential Monte Carlo Methods to Train Neural Network Models , 2000, Neural Computation.

[113]  Rajan Srinivasan,et al.  Adaptive importance sampling for performance evaluation and parameter optimization of communication systems , 2000, IEEE Trans. Commun..

[114]  Svetha Venkatesh,et al.  On the Recognition of Abstract Markov Policies , 2000, AAAI/IAAI.

[115]  Francis Sullivan,et al.  The Metropolis Algorithm , 2000, Computing in Science & Engineering.

[116]  Nando de Freitas,et al.  Reversible Jump MCMC Simulated Annealing for Neural Networks , 2000, UAI.

[117]  Neil J. Gordon,et al.  Editors: Sequential Monte Carlo Methods in Practice , 2001 .

[118]  Nando de Freitas,et al.  Variational MCMC , 2001, UAI.

[119]  Wolfram Burgard,et al.  Particle Filters for Mobile Robot Localization , 2001, Sequential Monte Carlo Methods in Practice.

[120]  W. Gilks,et al.  Following a moving target—Monte Carlo inference for dynamic Bayesian models , 2001 .

[121]  George Casella,et al.  Implementations of the Monte Carlo EM Algorithm , 2001 .

[122]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[123]  Stuart J. Russell,et al.  Approximate inference for first-order probabilistic languages , 2001, IJCAI.

[124]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[125]  P. Green,et al.  Modelling Heterogeneity With and Without the Dirichlet Process , 2001 .

[126]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering via Data Augmentation , 2001, NIPS.

[127]  David J. Fleet,et al.  Lattice Particle Filters , 2001, UAI.

[128]  Neil J. Gordon,et al.  Particles and Mixtures for Tracking and Guidance , 2001, Sequential Monte Carlo Methods in Practice.

[129]  Christophe Andrieu,et al.  Sequential Monte Carlo Methods for Optimal Filtering , 2001, Sequential Monte Carlo Methods in Practice.

[130]  W. Michael Conklin,et al.  Monte Carlo Methods in Bayesian Computation , 2001, Technometrics.

[131]  Nando de Freitas,et al.  Robust Full Bayesian Learning for Radial Basis Networks , 2001, Neural Computation.

[132]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[133]  Darren J. Wilkinson,et al.  Conditional simulation from highly structured Gaussian systems, with application to blocking-MCMC for the Bayesian analysis of very large linear models , 2002, Stat. Comput..

[134]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[135]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[136]  Akio Utsugi,et al.  Ensemble of Independent Factor Analyzers with Application to Natural Image Analysis , 2001, Neural Processing Letters.

[137]  P. Atzberger The Monte-Carlo Method , 2006 .

[138]  B. Jaumard,et al.  First Order Probabilistic Logic , 2006, NAFIPS 2006 - 2006 Annual Meeting of the North American Fuzzy Information Processing Society.