暂无分享,去创建一个
Junhyuk Oh | Satinder Singh | David Silver | Matteo Hessel | Zhongwen Xu | Hado van Hasselt | Junhyuk Oh | Satinder Singh | Matteo Hessel | H. V. Hasselt | Zhongwen Xu | David Silver
[1] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[2] M. J. Sobel,et al. Discounted MDP's: distribution functions and exponential utility maximization , 1987 .
[3] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[4] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
[8] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.
[9] Louis Kirsch,et al. Improving Generalization in Meta Reinforcement Learning using Learned Objectives , 2020, ICLR.
[10] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[11] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[12] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[13] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[14] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[15] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[16] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.
[17] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[18] Junhyuk Oh,et al. Discovering Reinforcement Learning Algorithms , 2020, NeurIPS.
[19] Junhyuk Oh,et al. What Can Learned Intrinsic Rewards Capture? , 2019, ICML.
[20] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[21] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[22] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[23] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[24] Tie-Yan Liu,et al. Beyond Exponentially Discounted Sum: Automatic Learning of Return Function , 2019, ArXiv.
[25] Kevin Barraclough,et al. I and i , 2001, BMJ : British Medical Journal.
[26] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[27] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[28] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[29] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.
[30] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[31] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .
[32] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[33] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[34] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[35] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[36] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[37] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[38] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[39] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[40] Junhyuk Oh,et al. A Self-Tuning Actor-Critic Algorithm , 2020, NeurIPS.
[41] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[42] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[43] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[44] Yevgen Chebotar,et al. Meta Learning via Learned Loss , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).