论文信息 - Continuous control with deep reinforcement learning

Continuous control with deep reinforcement learning

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

[1] R. Mazo. On the theory of brownian motion , 1973 .

[2] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[3] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[4] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[5] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.

[6] P. Dayan,et al. States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[7] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.

[8] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.

[9] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[10] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14] Ajay Kumar Tanwani,et al. Autonomous reinforcement learning with experience replay. , 2013, Neural networks : the official journal of the International Neural Network Society.

[15] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[16] Jürgen Schmidhuber,et al. Online Evolution of Deep Convolutional Network for Vision-Based Reinforcement Learning , 2014, SAB.

[17] Jürgen Schmidhuber,et al. Evolving deep unsupervised convolutional networks for vision-based reinforcement learning , 2014, GECCO.

[18] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[19] Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.

[20] Muhammad Ghifary,et al. Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies , 2015, ArXiv.

[21] Pieter Abbeel,et al. Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[22] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[24] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.

[27] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[28] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[29] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[30] Omer Levy,et al. Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .