暂无分享,去创建一个
David Silver | Matteo Hessel | Fabio Viola | Arthur Guez | Laurent Sifre | Ivo Danihelka | Theophane Weber | Hado van Hasselt | Simon Schmitt
[1] Satinder Singh,et al. The Value Equivalence Principle for Model-Based Reinforcement Learning , 2020, NeurIPS.
[2] Jürgen Schmidhuber,et al. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[3] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[4] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[5] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[6] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[7] Patrick M. Pilarski,et al. Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).
[8] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[9] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[10] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[11] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[12] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[13] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[16] David Silver,et al. Learning values across many orders of magnitude , 2016, NIPS.
[17] Bruno Scherrer,et al. Leverage the Average: an Analysis of Regularization in RL , 2020, ArXiv.
[18] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[19] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[20] Doina Precup,et al. Value-driven Hindsight Modelling , 2020, NeurIPS.
[21] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[22] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Jackie Kay,et al. Local Search for Policy Iteration in Continuous Control , 2020, ArXiv.
[25] Mohammad Ghavamzadeh,et al. Mirror Descent Policy Optimization , 2020, ArXiv.
[26] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[27] Daniel Guo,et al. Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning , 2020, ICML.
[28] Rémi Munos,et al. Neural Predictive Belief Representations , 2018, ArXiv.
[29] Karen Simonyan,et al. Off-Policy Actor-Critic with Shared Experience Replay , 2020, ICML.
[30] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[31] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[32] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[33] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[34] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[35] Jessica B. Hamrick,et al. Analogues of mental simulation and imagination in deep learning , 2019, Current Opinion in Behavioral Sciences.
[36] S. Kakade,et al. Reinforcement Learning: Theory and Algorithms , 2019 .
[37] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[38] Fabio Viola,et al. Causally Correct Partial Models for Reinforcement Learning , 2020, ArXiv.
[39] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[40] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning , 2017, ICLR.
[41] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.
[42] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[43] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[44] Catholijn M. Jonker,et al. Model-based Reinforcement Learning: A Survey , 2020, ArXiv.
[45] Martin A. Riedmiller,et al. Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models , 2019, CoRL.
[46] Eric Nalisnick,et al. Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..
[47] John Schulman,et al. Phasic Policy Gradient , 2020, ICML.
[48] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[49] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[50] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[51] Thomas Degris,et al. Scaling-up Knowledge for a Cognizant Robot , 2012, AAAI Spring Symposium: Designing Intelligent Robots.
[52] Tom Eccles,et al. An investigation of model-free planning , 2019, ICML.
[53] Petr Baudis,et al. PACHI: State of the Art Open Source Go Program , 2011, ACG.
[54] David Silver,et al. On Inductive Biases in Deep Reinforcement Learning , 2019, ArXiv.
[55] Leslie Pack Kaelbling,et al. Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.
[56] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[57] David Silver,et al. Online and Offline Reinforcement Learning by Planning with a Learned Model , 2021, NeurIPS.
[58] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[59] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.
[60] Razvan Pascanu,et al. Learning model-based planning from scratch , 2017, ArXiv.
[61] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[62] David Silver,et al. Learning and Planning in Complex Action Spaces , 2021, ICML.
[63] Matteo Hessel,et al. General non-linear Bellman equations , 2019, ArXiv.
[64] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[65] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[66] Matteo Hessel,et al. When to use parametric models in reinforcement learning? , 2019, NeurIPS.
[67] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[68] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[69] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[70] Matteo Hessel,et al. Podracer architectures for scalable Reinforcement Learning , 2021, ArXiv.
[71] Sriram Srinivasan,et al. OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.
[72] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[73] Jessica B. Hamrick,et al. On the role of planning in model-based deep reinforcement learning , 2020, ArXiv.
[74] Richard S. Sutton,et al. Learning to Predict Independent of Span , 2015, ArXiv.
[75] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[76] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[77] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[78] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[79] J. L. Testud,et al. Paper: Model predictive heuristic control , 1978 .
[80] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.
[81] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[82] Aaron van den Oord,et al. Shaping Belief States with Generative Environment Models for RL , 2019, NeurIPS.
[83] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[84] J. Richalet,et al. Model predictive heuristic control: Applications to industrial processes , 1978, Autom..
[85] Hao Chen,et al. ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search , 2018, AAAI.
[86] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[87] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[88] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[89] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[90] Allan Jabri,et al. Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control , 2018, ICML.
[91] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[92] Jessica B. Hamrick,et al. Combining Q-Learning and Search with Amortized Value Estimates , 2020, ICLR.
[93] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[94] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[95] Mohammad Norouzi,et al. Mastering Atari with Discrete World Models , 2020, ICLR.
[96] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.