Prioritized Experience Replay

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently. We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new state-of-the-art, outperforming DQN with uniform replay on 41 out of 49 games.

[1]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[2]  J. Diamond Zebras and the Anna Karenina Principle , 1994 .

[3]  Martin A. Riedmiller,et al.  Rprop - Description and Implementation Details , 1994 .

[4]  David Andre,et al.  Generalized Prioritized Sweeping , 1997, NIPS.

[5]  Richard K. Belew,et al.  New Methods for Competitive Coevolution , 1997, Evolutionary Computation.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[8]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[9]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[10]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[11]  Geoffrey E. Hinton,et al.  To recognize shapes, first learn to generate images. , 2007, Progress in brain research.

[12]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  L. Frank,et al.  Rewarded Outcomes Enhance Reactivation of Experience in the Hippocampus , 2009, Neuron.

[14]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[15]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[16]  Yi Sun,et al.  Incremental Basis Construction from Temporal Difference Error , 2011, ICML.

[17]  Alborz Geramifard,et al.  Online Discovery of Feature Dependencies , 2011, ICML.

[18]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20]  Richard S. Sutton,et al.  Planning by Prioritized Sweeping with Small Backups , 2013, ICML.

[21]  Tom Schaul,et al.  No more pesky learning rates , 2012, ICML.

[22]  R. Sutton,et al.  Surprise and Curiosity for Big Data Robotics , 2014 .

[23]  Richard S. Sutton,et al.  Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.

[24]  D. Dupret,et al.  Dopaminergic neurons promote hippocampal reactivation and spatial memory persistence , 2014, Nature Neuroscience.

[25]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[26]  D. Hassabis,et al.  Hippocampal place cells construct reward related sequences through unexplored space , 2015, eLife.

[27]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[28]  Laura A. Atherton,et al.  Memory trace replay: the shaping of memory consolidation by neuromodulation , 2015, Trends in Neurosciences.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[31]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[32]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[33]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[34]  Marc G. Bellemare,et al.  Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.