Bayesian optimization for adaptive MCMC

This paper proposes a new randomized strategy for adaptive MCMC using Bayesian optimization. This approach applies to non-differentiable objective functions and trades off exploration and exploitation to reduce the number of potentially costly objective function evaluations. We demonstrate the strategy in the complex setting of sampling from constrained, discrete and densely connected probabilistic graphical models where, for each variation of the problem, one needs to adjust the parameters of the proposal mechanism automatically to ensure efficient mixing of the Markov chains.

[1]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[2]  D. Lizotte,et al.  An experimental methodology for response surface optimization methods , 2012, J. Glob. Optim..

[3]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[4]  Kenny Q. Ye Orthogonal Column Latin Hypercubes and Their Application in Computer Experiments , 1998 .

[5]  David S. Leslie,et al.  Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..

[6]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[7]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[8]  D. Bertsekas Projected Newton methods for optimization problems with simple constraints , 1981, CDC 1981.

[9]  Gareth O. Roberts,et al.  Examples of Adaptive MCMC , 2009 .

[10]  C. Robert,et al.  Controlled MCMC for Optimal Sampling , 2001 .

[11]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[12]  Matti Vihola,et al.  Grapham: Graphical models with adaptive random walk Metropolis algorithms , 2008, Comput. Stat. Data Anal..

[13]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[14]  D. Finkel,et al.  Direct optimization algorithm user guide , 2003 .

[15]  Nando de Freitas,et al.  Intracluster Moves for Constrained Discrete-Space MCMC , 2010, UAI.

[16]  D. Lizotte Practical bayesian optimization , 2008 .

[17]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[18]  Michael A. Osborne Bayesian Gaussian processes for sequential prediction, optimisation and quadrature , 2010 .

[19]  A. Kennedy,et al.  Hybrid Monte Carlo , 1987 .

[20]  P. Priouret,et al.  Bayesian Time Series Models: Adaptive Markov chain Monte Carlo: theory and methods , 2011 .

[21]  J. Rosenthal,et al.  Coupling and Ergodicity of Adaptive Markov Chain Monte Carlo Algorithms , 2007, Journal of Applied Probability.

[22]  H. Haario,et al.  An adaptive Metropolis algorithm , 2001 .

[23]  Robert B. Gramacy,et al.  Parameter space exploration with Gaussian process trees , 2004, ICML.

[24]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[25]  K. Kawasaki Diffusion Constants near the Critical Point for Time-Dependent Ising Models. I , 1966 .

[26]  D. Rubin Using the SIR algorithm to simulate posterior distributions , 1988 .

[27]  E. Saksman,et al.  On the ergodicity of the adaptive Metropolis algorithm on unbounded domains , 2008, 0806.2933.

[28]  Julien Bect,et al.  On the convergence of the expected improvement algorithm , 2007 .

[29]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[30]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[31]  E. Vázquez,et al.  Convergence properties of the expected improvement algorithm , 2007, 0712.3744.

[32]  Donald R. Jones,et al.  Global versus local search in constrained optimization of computer models , 1998 .

[33]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[34]  A. P. Dawid,et al.  Gaussian Processes to Speed up Hybrid Monte Carlo for Expensive Bayesian Integrals , 2003 .

[35]  P. Diggle,et al.  Model‐based geostatistics , 2007 .

[36]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[37]  C. Andrieu,et al.  On the ergodicity properties of some adaptive MCMC algorithms , 2006, math/0610317.