论文信息 - On Catastrophic Interference in Atari 2600 Games

On Catastrophic Interference in Atari 2600 Games

Model-free deep reinforcement learning is sample inefficient. One hypothesis -- speculated, but not confirmed -- is that catastrophic interference within an environment inhibits learning. We test this hypothesis through a large-scale empirical study in the Arcade Learning Environment (ALE) and, indeed, find supporting evidence. We show that interference causes performance to plateau; the network cannot train on segments beyond the plateau without degrading the policy used to reach there. By synthetically controlling for interference, we demonstrate performance boosts across architectures, learning algorithms and environments. A more refined analysis shows that learning one segment of a game often increases prediction errors elsewhere. Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.

[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2] Marc G. Bellemare,et al. Skip Context Tree Switching , 2014, ICML.

[3] Sergey Levine,et al. Divide-and-Conquer Reinforcement Learning , 2017, ICLR.

[4] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[5] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[6] Hugo Larochelle,et al. Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction , 2019, AAAI.

[7] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.

[8] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[9] Marlos C. Machado,et al. Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment , 2019, ArXiv.

[10] Stefan Wermter,et al. Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[11] Conrad D. James,et al. Neurogenesis deep learning: Extending deep networks to accommodate new classes , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14] Honglak Lee,et al. Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[16] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.

[17] R Ratcliff,et al. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[18] Yee Whye Teh,et al. Continual Unsupervised Representation Learning , 2019, NeurIPS.

[19] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[20] R. French. Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[21] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Razvan Pascanu,et al. Ray Interference: a Source of Plateaus in Deep Reinforcement Learning , 2019, ArXiv.

[23] Joel Veness,et al. The Forget-me-not Process , 2016, NIPS.

[24] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26] U. Rieder,et al. Markov Decision Processes , 2010 .

[27] Tinne Tuytelaars,et al. Task-Free Continual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[29] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[30] R. Bellman. A Markovian Decision Process , 1957 .

[31] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[32] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.

[33] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[34] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[35] Qiang Yang,et al. Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[36] Sung Ju Hwang,et al. Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[37] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.

[38] Surya Ganguli,et al. Continual Learning Through Synaptic Intelligence , 2017, ICML.

[39] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[40] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[41] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.

[42] Anthony V. Robins,et al. Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[43] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[44] Marlos C. Machado,et al. Generalization and Regularization in DQN , 2018, ArXiv.

[45] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[46] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[47] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).