Mind the Gap when Conditioning Amortised Inference in Sequential Latent-Variable Models

Amortised inference enables scalable learning of sequential latent-variable models (LVMs) with the evidence lower bound (ELBO). In this setting, variational posteriors are often only partially conditioned. While the true posteriors depend, e.g., on the entire sequence of observations, approximate posteriors are only informed by past observations. This mimics the Bayesian filter---a mixture of smoothing posteriors. Yet, we show that the ELBO objective forces partially-conditioned amortised posteriors to approximate products of smoothing posteriors instead. Consequently, the learned generative model is compromised. We demonstrate these theoretical findings in three scenarios: traffic flow, handwritten digits, and aerial vehicle dynamics. Using fully-conditioned approximate posteriors, performance improves in terms of generative modelling and multi-step prediction.

[1]  Justin Bayer,et al.  Variational Tracking and Prediction with Generative Disentangled State-Space Models , 2019, ArXiv.

[2]  Sertac Karaman,et al.  The Blackbird Dataset: A large-scale dataset for UAV perception in aggressive flight , 2018, ISER.

[3]  Patrick van der Smagt,et al.  Switching Linear Dynamics for Variational Bayes Filtering , 2019, ICML.

[4]  Maximilian Karl,et al.  Learning to Fly via Deep Model-Based Reinforcement Learning , 2020, ArXiv.

[5]  David Duvenaud,et al.  Inference Suboptimality in Variational Autoencoders , 2018, ICML.

[6]  Rob Fergus,et al.  Stochastic Video Generation with a Learned Prior , 2018, ICML.

[7]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[8]  Ole Winther,et al.  A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning , 2017, NIPS.

[9]  Karol Gregor,et al.  Temporal Difference Variational Auto-Encoder , 2018, ICLR.

[10]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[11]  Juan Carlos Niebles,et al.  Learning to Decompose and Disentangle Representations for Video Prediction , 2018, NeurIPS.

[12]  Aaron C. Courville,et al.  Note on the bias and variance of variational inference , 2019, ArXiv.

[13]  Maximilian Karl,et al.  Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[14]  Max Welling,et al.  Improving Variational Autoencoders with Inverse Autoregressive Flow , 2016, NIPS.

[15]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[16]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[17]  Guokun Lai,et al.  Re-examination of the Role of Latent Variables in Sequence Modeling , 2019, NeurIPS.

[18]  Sebastian Nowozin,et al.  Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks , 2017, ICML.

[19]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[20]  Sebastian Nowozin,et al.  Debiasing Evidence Approximations: On Importance-weighted Autoencoders and Jackknife Variational Inference , 2018, ICLR.

[21]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[22]  Justin Bayer,et al.  Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series , 2016, ArXiv.

[23]  Christian Osendorfer,et al.  Learning Stochastic Recurrent Networks , 2014, NIPS 2014.

[24]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[25]  Diederik P. Kingma,et al.  Variational Recurrent Auto-Encoders , 2014, ICLR.

[26]  Richard Samuel Nankervis,et al.  Calculus of Variations , 2019, Perfect Form.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[29]  Patrick van der Smagt,et al.  Learning Hierarchical Priors in VAEs , 2019, NeurIPS.

[30]  Uri Shalit,et al.  Deep Kalman Filters , 2015, ArXiv.

[31]  Yee Whye Teh,et al.  Filtering Variational Objectives , 2017, NIPS.

[32]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[33]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[34]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  Max Welling,et al.  Product of experts , 2007, Scholarpedia.

[36]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[37]  Yinhai Wang,et al.  Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting , 2018, IEEE Transactions on Intelligent Transportation Systems.

[38]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[39]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[40]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[41]  Sergey Levine,et al.  Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[42]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[43]  Justin Bayer,et al.  Approximate Bayesian inference in spatial environments , 2018, Robotics: Science and Systems.

[44]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[45]  Simo Srkk,et al.  Bayesian Filtering and Smoothing , 2013 .

[46]  Fabio Viola,et al.  Learning and Querying Fast Generative Models for Reinforcement Learning , 2018, ArXiv.

[47]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[48]  Giulio Colavolpe,et al.  Elements of Information Theory , 2013 .

[49]  Zhiyong Cui,et al.  Deep Bidirectional and Unidirectional LSTM Recurrent Neural Network for Network-wide Traffic Speed Prediction , 2018, ArXiv.

[50]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[51]  Dmitry Vetrov,et al.  Variational Autoencoder with Arbitrary Conditioning , 2018, ICLR.

[52]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.