论文信息 - A Deeper Look at Experience Replay

A Deeper Look at Experience Replay

Recently experience replay is widely used in various deep reinforcement learning (RL) algorithms, in this paper we rethink the utility of experience replay. It introduces a new hyper-parameter, the memory buffer size, which needs carefully tuning. However unfortunately the importance of this new hyper-parameter has been underestimated in the community for a long time. In this paper we did a systematic empirical study of experience replay under various function representations. We showcase that a large replay buffer can significantly hurt the performance. Moreover, we propose a simple O(1) method to remedy the negative influence of a large replay buffer. We showcase its utility in both simple grid world and challenging domains like Atari games.

Richard S. Sutton | Shangtong Zhang | R. Sutton | Shangtong Zhang

[1] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[2] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[3] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[4] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[5] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[6] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[9] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[10] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[11] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[12] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[13] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[14] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[15] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[16] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[17] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[18] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[19] James Zou,et al. The Effects of Memory Replay in Reinforcement Learning , 2017, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.

[21] Vitaly Levdik,et al. Time Limits in Reinforcement Learning , 2017, ICML.