Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning
暂无分享,去创建一个
Martin A. Riedmiller | Kyunghyun Cho | Abbas Abdolmaleki | Jost Tobias Springenberg | Martin Riedmiller | William F. Whitney | Michael Bloesch | Kyunghyun Cho | A. Abdolmaleki | Michael Bloesch | J. T. Springenberg
[1] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[2] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[3] Jean Feydy,et al. Kernel Operations on the GPU, with Autodiff, without Memory Overflows , 2020, ArXiv.
[4] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[5] Yuval Tassa,et al. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.
[6] Martin A. Riedmiller,et al. Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics , 2020, CoRL.
[7] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[8] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[9] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[10] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[11] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] Daniel Guo,et al. Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.
[14] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[15] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[16] Georg Ostrovski,et al. Temporally-Extended ε-Greedy Exploration , 2020, ICLR.
[17] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[18] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[19] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[20] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[21] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[22] Abhinav Gupta,et al. Dynamics-aware Embeddings , 2019, ICLR.
[23] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[24] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[25] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[26] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[27] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
[28] Martin A. Riedmiller,et al. Reinforcement learning on explicitly specified time scales , 2003, Neural Computing & Applications.
[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[30] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[31] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[32] Marlos C. Machado,et al. Count-Based Exploration with the Successor Representation , 2018, AAAI.
[33] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[34] Shimon Whiteson,et al. Optimistic Exploration even with a Pessimistic Initialisation , 2020, ICLR.
[35] Christopher F. Parmeter,et al. Normal reference bandwidths for the general order, multivariate kernel density derivative estimator , 2012 .
[36] Marlos C. Machado,et al. On Bonus Based Exploration Methods In The Arcade Learning Environment , 2020, ICLR.
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[39] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[40] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[41] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[42] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[43] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.