A Deeper Look at Experience Replay

Recently experience replay is widely used in various deep reinforcement learning (RL) algorithms, in this paper we rethink the utility of experience replay. It introduces a new hyper-parameter, the memory buffer size, which needs carefully tuning. However unfortunately the importance of this new hyper-parameter has been underestimated in the community for a long time. In this paper we did a systematic empirical study of experience replay under various function representations. We showcase that a large replay buffer can significantly hurt the performance. Moreover, we propose a simple O(1) method to remedy the negative influence of a large replay buffer. We showcase its utility in both simple grid world and challenging domains like Atari games.

[1]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[2]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[3]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[4]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[5]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[6]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[8]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[9]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[10]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[11]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[12]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[13]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[14]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[15]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[16]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[17]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[18]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[19]  James Zou,et al.  The Effects of Memory Replay in Reinforcement Learning , 2017, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[21]  Vitaly Levdik,et al.  Time Limits in Reinforcement Learning , 2017, ICML.