Linear Response Methods for Accurate Covariance Estimates from Mean Field Variational Bayes

Mean field variational Bayes (MFVB) is a popular posterior approximation method due to its fast runtime on large-scale data sets. However, a well known major failing of MFVB is that it underestimates the uncertainty of model variables (sometimes severely) and provides no information about model variable covariance. We generalize linear response methods from statistical physics to deliver accurate uncertainty estimates for model variables—both for individual variables and coherently across variables. We call our method linear response variational Bayes (LRVB). When the MFVB posterior approximation is in the exponential family, LRVB has a simple, analytic form, even for non-conjugate models. Indeed, we make no assumptions about the form of the true posterior. We demonstrate the accuracy and scalability of our method on a range of models for both simulated and real data.

[1]  Xiao-Li Meng,et al.  Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm , 1991 .

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Hilbert J. Kappen,et al.  Efficient Learning in Boltzmann Machines Using Linear Response Theory , 1998, Neural Computation.

[4]  Toshiyuki TANAKA Mean-field theory of Boltzmann machine learning , 1998 .

[5]  Toshiyuki Tanaka,et al.  Information Geometry of Mean-Field Approximation , 2000, Neural Computation.

[6]  M. Opper,et al.  Advanced mean field methods: theory and practice , 2001 .

[7]  Ole Winther,et al.  Mean-Field Approaches to Independent Component Analysis , 2002, Neural Computation.

[8]  Ole Winther,et al.  Variational Linear Response , 2003, NIPS.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Yee Whye Teh,et al.  Linear Response Algorithms for Approximate Inference in Graphical Models , 2004, Neural Computation.

[11]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[12]  Bo Wang,et al.  Inadequacy of interval estimates corresponding to variational Bayesian approximations , 2005, AISTATS.

[13]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[14]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[15]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[16]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[17]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[18]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[19]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[20]  Jarrod D. Hadfield,et al.  MCMC methods for multi-response generalized linear mixed models , 2010 .

[21]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[22]  Douglas M. Bates,et al.  Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package , 2013 .

[23]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[24]  R. Horgan,et al.  Statistical Field Theory , 2014 .

[25]  Iain Dunning,et al.  Computing in Operations Research Using Julia , 2013, INFORMS J. Comput..