Continual Learning with Bayesian Neural Networks for Non-Stationary Data

This work addresses continual learning for non-stationary data, using Bayesian neural networks and memory-based online variational Bayes. We represent the posterior approximation of the network weights by a diagonal Gaussian distribution and a complementary memory of raw data. This raw data corresponds to likelihood terms that cannot be well approximated by the Gaussian. We introduce a novel method for sequentially updating both components of the posterior approximation. Furthermore, we propose Bayesian forgetting and a Gaussian diffusion process for adapting to non-stationary data. The experimental results show that our update method improves on existing approaches for streaming data. Additionally, the adaptation methods lead to better predictive performance for non-stationary data.

[1]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[2]  Qiang Yang,et al.  Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[3]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[4]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[5]  David M. Blei,et al.  The Population Posterior and Bayesian Modeling on Streams , 2015, NIPS.

[6]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[7]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[8]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[9]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[10]  R. Kulhavý,et al.  On a general concept of forgetting , 1993 .

[11]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[14]  Manfred Opper,et al.  A Bayesian approach to on-line learning , 1999 .

[15]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[16]  Ole Winther,et al.  Expectation Consistent Free Energies for Approximate Inference , 2004, NIPS.

[17]  Tom Minka,et al.  Non-conjugate Variational Message Passing for Multinomial and Binary Regression , 2011, NIPS.

[18]  Wei Xu,et al.  Modeling concept drift from the perspective of classifiers , 2008, 2008 IEEE Conference on Cybernetics and Intelligent Systems.

[19]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[20]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[21]  Tom Heskes,et al.  Neural networks learning in a changing environment , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[22]  Bing Liu,et al.  Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[23]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[24]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Guido Sanguinetti,et al.  Approximate inference in latent Gaussian-Markov models from continuous time observations , 2013, NIPS.

[26]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[27]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[28]  Antti Honkela,et al.  On-line Variational Bayesian Learning , 2003 .

[29]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[30]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[31]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[32]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[33]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[34]  David Barber,et al.  Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[35]  Richard E. Turner,et al.  Overpruning in Variational Bayesian Neural Networks , 2018, 1801.06230.

[36]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[37]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[38]  Finale Doshi-Velez,et al.  Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors , 2018, ICML.

[39]  Andreas Ziehe,et al.  Adaptive On-line Learning in Changing Environments , 1996, NIPS.

[40]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[41]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[42]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[43]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[44]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[45]  Yuan Qi,et al.  Virtual Vector Machine for Bayesian Online Classification , 2009, UAI.

[46]  Ronald Kemker,et al.  FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[47]  Yan Liu,et al.  Deep Generative Dual Memory Network for Continual Learning , 2017, ArXiv.