暂无分享,去创建一个
Nando de Freitas | Alexander Novikov | Mohammad Norouzi | Rishabh Agarwal | George Tucker | Gabriel Dulac-Arnold | Matt Hoffman | Ziyu Wang | Ofir Nachum | Nicolas Heess | Jerry Li | Cosmin Paduraru | Josh Merel | Caglar Gulcehre | Konrad Zolna | Tom Le Paine | Daniel Mankowitz | Sergio G'omez Colmenarejo
[1] Hervé Frezza-Buet,et al. Sample-efficient batch reinforcement learning for dialogue management optimization , 2011, TSLP.
[2] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.
[3] Kallirroi Georgila,et al. Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets , 2008, CL.
[4] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[5] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[6] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[7] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[8] Yuval Tassa,et al. dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.
[9] Nicolas Heess,et al. Hierarchical visuomotor control of humanoids , 2018, ICLR.
[10] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[11] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[14] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[15] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[16] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[17] Richard Tanburn,et al. Making Efficient Use of Demonstrations to Solve Hard Exploration Problems , 2019, ICLR.
[18] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Deep Reinforcement Learning , 2020, International Conference on Machine Learning.
[19] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[20] Jiri Matas,et al. COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images , 2016, ArXiv.
[21] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[22] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[23] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[24] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[25] Yee Whye Teh,et al. Neural probabilistic motor primitives for humanoid control , 2018, ICLR.
[26] Romain Laroche,et al. Safe Policy Improvement with Baseline Bootstrapping , 2017, ICML.
[27] Oleg O. Sushkov,et al. Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.
[28] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[29] Yuval Tassa,et al. Deep neuroethology of a virtual rodent , 2019, ICLR.
[30] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.
[31] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[32] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[33] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[34] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[35] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[36] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[37] H. Francis Song,et al. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.
[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[39] Yisong Yue,et al. Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning , 2019, ArXiv.
[40] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[41] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[42] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[43] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[44] Justin Fu,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[45] Nir Levine,et al. An empirical investigation of the challenges of real-world reinforcement learning , 2020, ArXiv.
[46] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[47] Joelle Pineau,et al. Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.