论文信息 - Acme: A Research Framework for Distributed Reinforcement Learning

Acme: A Research Framework for Distributed Reinforcement Learning

Deep reinforcement learning has led to many recent-and groundbreaking-advancements. However, these advances have often come at the cost of both the scale and complexity of the underlying RL algorithms. Increases in complexity have in turn made it more difficult for researchers to reproduce published RL algorithms or rapidly prototype ideas. To address this, we introduce Acme, a tool to simplify the development of novel RL algorithms that is specifically designed to enable simple agent implementations that can be run at various scales of execution. Our aim is also to make the results of various RL algorithms developed in academia and industrial labs easier to reproduce and extend. To this end we are releasing baseline implementations of various algorithms, created using our framework. In this work we introduce the major design decisions behind Acme and show how these are used to construct these baselines. We also experiment with these agents at different scales of both complexity and computation-including distributed versions. Ultimately, we show that the design decisions behind Acme lead to agents that can be scaled both up and down and that, for the most part, greater levels of parallelization result in agents with equivalent performance, just faster.

[1] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[2] Donald Michie,et al. Knowledge, Learning and Machine Intelligence , 1993 .

[3] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[6] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[7] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9] Shane Legg,et al. Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[10] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.

[11] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[13] Stuart J. Russell,et al. Rationality and Intelligence: A Brief Update , 2013, PT-AI.

[14] Joelle Pineau,et al. Learning from Limited Demonstrations , 2013, NIPS.

[15] Matthieu Geist,et al. Boosted Bellman Residual Minimization Handling Expert Demonstrations , 2014, ECML/PKDD.

[16] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[17] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[18] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[20] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[21] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[22] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[23] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.