Herded Gibbs Sampling

Abstract: The Gibbs sampler is one of the most popular algorithms for inference in statistical models. In this paper, we introduce a herding variant of this algorithm, called herded Gibbs, that is entirely deterministic. We prove that herded Gibbs has an $O(1/T)$ convergence rate for models with independent variables and for fully connected probabilistic graphical models. Herded Gibbs is shown to outperform Gibbs in the tasks of image denoising with MRFs and named entity recognition with CRFs. However, the convergence for herded Gibbs for sparsely connected probabilistic graphical models is still an open problem.

[1]  David Duvenaud,et al.  Optimally-Weighted Herding is Bayesian Quadrature , 2012, UAI.

[2]  J. Propp,et al.  Rotor Walks and Markov Chains , 2009, 0904.4507.

[3]  Su Chen,et al.  New Inputs and Methods for Markov Chain Quasi-Monte Carlo , 2012 .

[4]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[5]  Max Welling,et al.  Herding Dynamic Weights for Partially Observed Random Field Models , 2009, UAI.

[6]  H. Weyl Über die Gleichverteilung von Zahlen mod. Eins , 1916 .

[7]  Yuchung J. Wang,et al.  Canonical representation of conditionally specified multivariate discrete distributions , 2009, J. Multivar. Anal..

[8]  A. Owen,et al.  Consistency of Markov chain quasi-Monte Carlo on continuous state spaces , 2011, 1105.1896.

[9]  M. Evans,et al.  Methods for Approximating Integrals in Statistics with Special Emphasis on Bayesian Integration Problems , 1995 .

[10]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[11]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[12]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[13]  Neil J. Gordon,et al.  Editors: Sequential Monte Carlo Methods in Practice , 2001 .

[14]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[15]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[16]  Lloyd T. Elliott,et al.  Driving Markov chain Monte Carlo with a dependent random stream , 2012, 1204.3187.

[17]  Nando de Freitas,et al.  Nonparametric Bayesian Logic , 2005, UAI.

[18]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[19]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[20]  Andrew Gelfand,et al.  On Herding and the Perceptron Cycling Theorem , 2010, NIPS.

[21]  Jean-Paul Chilès,et al.  Wiley Series in Probability and Statistics , 2012 .

[22]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[23]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[24]  M. Welling,et al.  Statistical inference using weak chaos and infinite memory , 2010 .

[25]  John Odentrantz,et al.  Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.

[26]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[27]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[28]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Stuart J. Russell,et al.  General-Purpose MCMC Inference over Relational Structures , 2006, UAI.

[30]  Max Welling,et al.  Herding dynamical weights to learn , 2009, ICML '09.

[31]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[32]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[33]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[34]  Christian P. Robert,et al.  Bayesian computational methods , 2010, 1002.2702.

[35]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[36]  Arnaud Doucet,et al.  An Adaptive Interacting Wang–Landau Algorithm for Automatic Density Exploration , 2011, 1109.3829.