TD-regularized actor-critic methods
暂无分享,去创建一个
Mohammad Emtiyaz Khan | Voot Tangkaratt | Jan Peters | Simone Parisi | Jan Peters | M. E. Khan | Simone Parisi | Voot Tangkaratt
[1] H. Robbins. A Stochastic Approximation Method , 1951 .
[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[3] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[4] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[5] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[6] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[7] Le Song,et al. Boosting the Actor with Dual Critic , 2017, ICLR.
[8] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[9] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[10] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[11] Sergey Levine,et al. MuProp: Unbiased Backpropagation for Stochastic Neural Networks , 2015, ICLR.
[12] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[13] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.
[14] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[15] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[16] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[17] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[18] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[19] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[20] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[21] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[22] Ron Meir,et al. Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation , 2008, NIPS.
[23] Hany Abdulsamad,et al. Model-Free Trajectory Optimization for Reinforcement Learning , 2016, ICML.
[24] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[25] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[27] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[28] Jan Peters,et al. f-Divergence constrained policy improvement , 2017, ArXiv.
[29] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[30] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.
[31] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[32] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[33] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[34] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[35] Tapani Raiko,et al. International Conference on Learning Representations (ICLR) , 2016 .
[36] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[37] Stephen J. Wright,et al. Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .
[38] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[39] 吉川 恒夫,et al. Foundations of robotics : analysis and control , 1990 .
[40] Andreas Krause,et al. Advances in Neural Information Processing Systems (NIPS) , 2014 .
[41] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[42] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[43] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[44] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[45] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[46] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[47] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.