暂无分享,去创建一个
Sepp Hochreiter | Markus Hofmarcher | Philipp Renz | Angela Bitto-Nemling | Marius-Constantin Dinu | S. Hochreiter | Kajetan Schweighofer | Vihang Patil | M. Hofmarcher | Marius-Constantin Dinu | Philipp Renz | Angela Bitto-Nemling | Kajetan Schweighofer | Vihang Patil
[1] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[2] Olivier Sigaud,et al. Offline Reinforcement Learning Hands-On , 2020, ArXiv.
[3] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[4] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[5] Sepp Hochreiter,et al. Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER , 2020, Trans. Large Scale Data Knowl. Centered Syst..
[6] P. Flajolet,et al. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .
[7] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[8] Joelle Pineau,et al. Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.
[9] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[10] Fred L. Drake,et al. Python 3 Reference Manual , 2009 .
[11] Sepp Hochreiter,et al. Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution , 2020, ArXiv.
[12] Martin A. Riedmiller,et al. Collect & Infer - a fresh look at data-efficient Reinforcement Learning , 2021, CoRL.
[13] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[14] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.
[15] John D. Hunter,et al. Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.
[16] Sergey Levine,et al. RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.
[17] R. McFarlane. A Survey of Exploration Strategies in Reinforcement Learning , 2003 .
[18] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[19] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[20] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.
[21] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[22] Trevor Darrell,et al. BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.
[23] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[24] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[26] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[27] Sergey Levine,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[28] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.
[29] Tian Tian,et al. MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments , 2019 .
[30] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[31] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.