Fast Black-box Variational Inference through Stochastic Trust-Region Optimization

We introduce TrustVI, a fast second-order algorithm for black-box variational inference based on trust-region optimization and the reparameterization trick. At each iteration, TrustVI proposes and assesses a step based on minibatches of draws from the variational distribution. The algorithm provably converges to a stationary point. We implemented TrustVI in the Stan framework and compared it to two alternatives: Automatic Differentiation Variational Inference (ADVI) and Hessian-free Stochastic Gradient Variational Inference (HFSGVI). The former is based on stochastic first-order optimization. The latter uses second-order information, but lacks convergence guarantees. TrustVI typically converged at least one order of magnitude faster than ADVI, demonstrating the value of stochastic second-order information. TrustVI often found substantially better variational distributions than HFSGVI, demonstrating that our convergence theory can matter in practice.

[1]  Prabhat,et al.  Learning an Astronomical Catalog of the Visible Universe through Scalable Bayesian Inference , 2016, ArXiv.

[2]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[3]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[4]  Felix Lenders,et al.  trlib: a vector-free implementation of the GLTR method for iterative solution of the trust region problem , 2016, Optim. Methods Softw..

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[7]  S. E. Hills,et al.  Illustration of Bayesian Inference in Normal Data Models Using Gibbs Sampling , 1990 .

[8]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[9]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[10]  David J. Lunn,et al.  The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .

[11]  Raghu Pasupathy,et al.  ASTRO-DF: A Class of Adaptive Sampling Trust-Region Algorithms for Derivative-Free Stochastic Optimization , 2016, SIAM J. Optim..

[12]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[13]  Nicholas I. M. Gould,et al.  Solving the Trust-Region Subproblem using the Lanczos Method , 1999, SIAM J. Optim..

[14]  Michael C. Ferris,et al.  Variable-Number Sample-Path Optimization , 2008, Math. Program..

[15]  Juan J. Alonso,et al.  Automatic Differentiation Through the Use of Hyper-Dual Numbers for Second Derivatives , 2012 .

[16]  Susan R. Hunter,et al.  ASTRO-DF: Adaptive sampling trust-region optimization algorithms, heuristics, and numerical experience , 2016, 2016 Winter Simulation Conference (WSC).

[17]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[18]  Katya Scheinberg,et al.  Stochastic optimization using a trust-region method and random models , 2015, Mathematical Programming.

[19]  B. Efron,et al.  The Jackknife Estimate of Variance , 1981 .

[20]  James T. Kwok,et al.  Fast Second Order Stochastic Backpropagation for Variational Inference , 2015, NIPS.

[21]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.