暂无分享,去创建一个
Nando de Freitas | Rémi Munos | Koray Kavukcuoglu | Ziyu Wang | Nicolas Heess | Volodymyr Mnih | Victor Bapst | Ziyun Wang | K. Kavukcuoglu | N. Heess | R. Munos | Volodymyr Mnih | N. D. Freitas | V. Bapst
[1] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[2] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[3] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[4] Leslie Pack Kaelbling,et al. Off-Policy Policy Search , 2007 .
[5] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.
[6] Pieter Abbeel,et al. On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient , 2010, NIPS.
[7] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[8] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[9] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[10] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[11] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Regina Barzilay,et al. Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.
[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[15] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[16] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[17] Marc G. Bellemare,et al. Q(λ) with Off-Policy Corrections , 2016, ALT.
[18] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[19] Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.
[20] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[21] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[22] Marc G. Bellemare,et al. Q($\lambda$) with Off-Policy Corrections , 2016 .
[23] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .
[24] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[25] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[26] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[27] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.