论文信息 - Continual Learning with Bayesian Neural Networks for Non-Stationary Data - 字舞流文

Continual Learning with Bayesian Neural Networks for Non-Stationary Data

This work addresses continual learning for non-stationary data, using Bayesian neural networks and memory-based online variational Bayes. We represent the posterior approximation of the network weights by a diagonal Gaussian distribution and a complementary memory of raw data. This raw data corresponds to likelihood terms that cannot be well approximated by the Gaussian. We introduce a novel method for sequentially updating both components of the posterior approximation. Furthermore, we propose Bayesian forgetting and a Gaussian diffusion process for adapting to non-stationary data. The experimental results show that our update method improves on existing approaches for streaming data. Additionally, the adaptation methods lead to better predictive performance for non-stationary data.

Botond Cseke | Patrick van der Smagt | Stephan Günnemann | Richard Kurle | Alexej Klushyn | Botond Cseke | Stephan Günnemann | Richard Kurle | Alexej Klushyn

[1] Jun S. Liu,et al. Sequential Monte Carlo methods for dynamic systems , 1997 .

[2] Qiang Yang,et al. Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[3] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[4] Richard E. Turner,et al. Two problems with variational expectation maximisation for time-series models , 2011 .

[5] David M. Blei,et al. The Population Posterior and Bayesian Modeling on Streams , 2015, NIPS.

[6] Manfred Opper,et al. The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[7] Yee Whye Teh,et al. Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[8] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[9] Stephen Grossberg,et al. Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[10] R. Kulhavý,et al. On a general concept of forgetting , 1993 .

[11] Andre Wibisono,et al. Streaming Variational Bayes , 2013, NIPS.

[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13] Gerhard Widmer,et al. Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[14] Manfred Opper,et al. A Bayesian approach to on-line learning , 1999 .

[15] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[16] Ole Winther,et al. Expectation Consistent Free Energies for Approximate Inference , 2004, NIPS.

[17] Tom Minka,et al. Non-conjugate Variational Message Passing for Multinomial and Binary Regression , 2011, NIPS.

[18] Wei Xu,et al. Modeling concept drift from the perspective of classifiers , 2008, 2008 IEEE Conference on Cybernetics and Intelligent Systems.

[19] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[20] Mark B. Ring. CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[21] Tom Heskes,et al. Neural networks learning in a changing environment , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[22] Bing Liu,et al. Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[23] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[24] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25] Guido Sanguinetti,et al. Approximate inference in latent Gaussian-Markov models from continuous time observations , 2013, NIPS.

[26] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[27] Masa-aki Sato,et al. Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[28] Antti Honkela,et al. On-line Variational Bayesian Learning , 2003 .

[29] R Ratcliff,et al. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[30] Eric Eaton,et al. ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[31] Stefan Wermter,et al. Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[32] Anthony V. Robins,et al. Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[33] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.

[34] David Barber,et al. Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[35] Richard E. Turner,et al. Overpruning in Variational Bayesian Neural Networks , 2018, 1801.06230.

[36] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.

[37] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.

[38] Finale Doshi-Velez,et al. Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors , 2018, ICML.

[39] Andreas Ziehe,et al. Adaptive On-line Learning in Changing Environments , 1996, NIPS.

[40] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[41] Richard E. Turner,et al. Variational Continual Learning , 2017, ICLR.

[42] Gregory Ditzler,et al. Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[43] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[44] Jiwon Kim,et al. Continual Learning with Deep Generative Replay , 2017, NIPS.

[45] Yuan Qi,et al. Virtual Vector Machine for Bayesian Online Classification , 2009, UAI.

[46] Ronald Kemker,et al. FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[47] Yan Liu,et al. Deep Generative Dual Memory Network for Continual Learning , 2017, ArXiv.