Variational Consensus Monte Carlo

Practitioners of Bayesian statistics have long depended on Markov chain Monte Carlo (MCMC) to obtain samples from intractable posterior distributions. Unfortunately, MCMC algorithms are typically serial, and do not scale to the large datasets typical of modern machine learning. The recently proposed consensus Monte Carlo algorithm removes this limitation by partitioning the data and drawing samples conditional on each partition in parallel [22]. A fixed aggregation function then combines these samples, yielding approximate posterior samples. We introduce variational consensus Monte Carlo (VCMC), a variational Bayes algorithm that optimizes over aggregation functions to obtain samples from a distribution that better approximates the target. The resulting objective contains an intractable entropy term; we therefore derive a relaxation of the objective and show that the relaxed problem is blockwise concave under mild conditions. We illustrate the advantages of our algorithm on three inference tasks from the literature, demonstrating both the superior quality of the posterior approximation and the moderate overhead of the optimization step. Our algorithm achieves a relative error reduction (measured against serial MCMC) of up to 39% compared to consensus Monte Carlo on the task of estimating 300-dimensional probit regression parameter expectations; similarly, it achieves an error reduction of 92% on the task of estimating cluster comembership probabilities in a Gaussian mixture model with 8 components in 8 dimensions. Furthermore, these gains come at moderate cost compared to the runtime of serial MCMC—achieving near-ideal speedup in some instances.

[1]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  John G. van Bosse,et al.  Wiley Series in Telecommunications and Signal Processing , 2006 .

[4]  Max Welling,et al.  Asynchronous Distributed Learning of Topic Models , 2008, NIPS.

[5]  Zoubin Ghahramani,et al.  Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process , 2009, NIPS.

[6]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[7]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[8]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[9]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[10]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[11]  Matthew J. Johnson,et al.  Analyzing Hogwild Parallel Gaussian Gibbs Sampling , 2013, NIPS.

[12]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[13]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[14]  Chong Wang,et al.  An Adaptive Learning Rate for Stochastic Variational Inference , 2013, ICML.

[15]  Xiangyu Wang,et al.  Parallel MCMC via Weierstrass Sampler , 2013, ArXiv.

[16]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[17]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[18]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[19]  Jonathan P. How,et al.  Approximate Decentralized Bayesian Inference , 2014, UAI.

[20]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[21]  Ryan P. Adams,et al.  Parallel MCMC with generalized elliptical slice sampling , 2012, J. Mach. Learn. Res..

[22]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[23]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[24]  David M. Blei,et al.  Smoothed Gradients for Stochastic Variational Inference , 2014, NIPS.

[25]  Mark A. Girolami,et al.  Unbiased Bayes for Big Data: Paths of Partial Posteriors , 2015, ArXiv.

[26]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[27]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .