暂无分享,去创建一个
Dale Schuurmans | Mohammad Norouzi | Kelvin Xu | Ofir Nachum | Dale Schuurmans | Ofir Nachum | Mohammad Norouzi | Kelvin Xu
[1] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[2] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[3] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[4] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[5] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[6] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[7] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[8] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[9] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[10] Dale Schuurmans,et al. Reward Augmented Maximum Likelihood for Neural Structured Prediction , 2016, NIPS.
[11] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[12] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[13] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[14] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[15] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[16] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[17] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[18] Navdeep Jaitly,et al. Discrete Sequential Prediction of Continuous Actions for Deep RL , 2017, ArXiv.
[19] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[20] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[21] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[22] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[23] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[24] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.
[25] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[26] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[27] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[28] Vicenç Gómez,et al. Dynamic Policy Programming with Function Approximation , 2011, AISTATS.
[29] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[30] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[31] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..
[32] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[33] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[34] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[35] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.