暂无分享,去创建一个
Sergey Levine | Richard E. Turner | Zoubin Ghahramani | Shixiang Gu | Timothy P. Lillicrap | S. Levine | S. Gu | T. Lillicrap | Zoubin Ghahramani
[1] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[2] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[3] Christopher G. Atkeson,et al. A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.
[4] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[6] Lei Xu,et al. Input Convex Neural Networks : Supplementary Material , 2017 .
[7] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[8] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] Omer Levy,et al. Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .
[11] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[12] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[13] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[14] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[15] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[16] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[17] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[18] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[19] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[20] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[21] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[22] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[23] Sergey Levine,et al. MuProp: Unbiased Backpropagation for Stochastic Neural Networks , 2015, ICLR.
[24] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[25] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.
[26] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[27] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[28] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[29] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[30] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[31] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .
[32] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[33] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[34] Michael I. Jordan,et al. Variational Bayesian Inference with Stochastic Search , 2012, ICML.
[35] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[36] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.