暂无分享,去创建一个
Martin A. Riedmiller | Thomas Lampe | Abbas Abdolmaleki | Jost Tobias Springenberg | Martin Riedmiller | Roland Hafner | Michael Neunert | Felix Berkenkamp | Noah Y. Siegel | Roland Hafner | T. Lampe | A. Abdolmaleki | Michael Neunert | Noah Siegel | Felix Berkenkamp | J. T. Springenberg | Thomas Lampe
[1] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[2] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[3] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[5] Yee Whye Teh,et al. Information asymmetry in KL-regularized RL , 2019, ICLR.
[6] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[7] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[10] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[11] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[12] Dale Schuurmans,et al. Striving for Simplicity in Off-policy Deep Reinforcement Learning , 2019, ArXiv.
[13] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[14] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[15] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[16] Yee Whye Teh,et al. Exploiting Hierarchy for Learning and Transfer in KL-regularized RL , 2019, ArXiv.
[17] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[18] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[19] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[20] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[22] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[23] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[24] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[25] Richard E. Turner,et al. Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control , 2016, ICML.
[26] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[27] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[28] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[29] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[30] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[31] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.