暂无分享,去创建一个
David Silver | Matteo Hessel | Joseph Modayil | Hado van Hasselt | Joseph Modayil | Matteo Hessel | H. V. Hasselt | David Silver
[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[2] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[3] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[4] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..
[5] Yoshua Bengio,et al. Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.
[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[7] Balaraman Ravindran,et al. Dynamic Action Repetition for Deep Reinforcement Learning , 2017, AAAI.
[8] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[9] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[10] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[11] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
[12] Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.
[13] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[14] Razvan Pascanu,et al. Understanding the exploding gradient problem , 2012, ArXiv.
[15] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[16] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[17] David Silver,et al. Learning values across many orders of magnitude , 2016, NIPS.
[18] Amir Massoud Farahmand,et al. Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.
[19] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.
[20] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[21] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[22] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[23] Sergey Levine,et al. Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.
[24] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[25] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[26] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[27] Alexei A. Efros,et al. Investigating Human Priors for Playing Video Games , 2018, ICML.
[28] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[30] Katja Hofmann,et al. The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.
[31] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[32] Katherine D. Kinzler,et al. Core knowledge. , 2007, Developmental science.
[33] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.