Mastering Atari, Go, chess and shogi by planning with a learned model
暂无分享,去创建一个
Demis Hassabis | Karen Simonyan | David Silver | Thore Graepel | Thomas Hubert | Julian Schrittwieser | Ioannis Antonoglou | Arthur Guez | Laurent Sifre | Timothy Lillicrap | Edward Lockhart | Simon Schmitt | L. Sifre | T. Lillicrap | D. Hassabis | D. Silver | A. Guez | Ioannis Antonoglou | T. Graepel | K. Simonyan | T. Hubert | Julian Schrittwieser | Edward Lockhart | Simon Schmitt | David Silver
[1] Jonathan Schaeffer,et al. A World Championship Caliber Checkers Program , 1992, Artif. Intell..
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Subbarao Kambhampati,et al. Planning and Scheduling , 1997, The Computer Science and Engineering Handbook.
[4] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[5] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[8] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[9] Rémi Coulom,et al. Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength , 2008, Computers and Games.
[10] H. Jaap van den Herik,et al. Single-Player Monte-Carlo Tree Search , 2008, Computers and Games.
[11] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[12] Christopher D. Rosin,et al. Multi-armed bandits with episode context , 2011, Annals of Mathematics and Artificial Intelligence.
[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[14] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[15] Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.
[16] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[17] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[18] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[20] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[21] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[22] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[23] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[24] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[25] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[26] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[27] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[28] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[29] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[30] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning , 2017, ICLR 2018.
[31] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[32] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[33] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract) , 2018, IJCAI.
[34] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[35] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.
[36] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.
[37] Weitang Liu,et al. Surprising Negative Results for Generative Adversarial Tree Search , 2018, 1806.05780.
[38] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[39] Mike Preuss,et al. Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.
[40] Fabio Viola,et al. Learning and Querying Fast Generative Models for Reinforcement Learning , 2018, ArXiv.
[41] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[42] Marc G. Bellemare,et al. DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.
[43] Matteo Hessel,et al. When to use parametric models in reinforcement learning? , 2019, NeurIPS.
[44] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[45] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[46] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[47] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[48] Karen Simonyan,et al. Off-Policy Actor-Critic with Shared Experience Replay , 2020, ICML.