论文信息 - Massively Parallel Methods for Deep Reinforcement Learning

Massively Parallel Methods for Deep Reinforcement Learning

We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.

[1] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[2] Gerhard Weiß,et al. Distributed reinforcement learning , 1995, Robotics Auton. Syst..

[3] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[4] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6] Daniel Kudenko,et al. Parallel reinforcement learning with linear function approximation , 2007, AAMAS '07.

[7] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[8] Dale Schuurmans,et al. MapReduce for Parallel Reinforcement Learning , 2011, EWRL.

[9] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[10] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[13] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14] David Silver,et al. Concurrent Reinforcement Learning from Customer Interactions , 2013, ICML.

[15] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.

[16] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[17] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.