Increasing the Action Gap: New Operators for Reinforcement Learning
暂无分享,去创建一个
Marc G. Bellemare | Rémi Munos | Philip S. Thomas | Arthur Guez | Georg Ostrovski | A. Guez | Georg Ostrovski | R. Munos | P. Thomas
[1] Christopher G. Atkeson,et al. Using locally weighted regression for robot learning , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.
[2] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[3] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[4] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[5] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[6] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] Andrew W. Moore,et al. Barycentric Interpolators for Continuous Space and Time Reinforcement Learning , 1998, NIPS.
[9] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[10] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[11] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .
[12] H. Kushner. Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .
[13] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
[14] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[17] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[18] Julian Togelius,et al. Super mario evolution , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.
[19] Russ Tedrake,et al. System Identification of Post Stall Aerodynamics for UAV Perching , 2009 .
[20] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.
[21] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[22] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[23] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[24] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[25] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[26] P. Schrimpf,et al. Dynamic Programming , 2011 .
[27] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[28] Amir Massoud Farahmand,et al. Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.
[29] Warren B. Powell,et al. Bias-corrected Q-learning to control max-operator bias in Q-learning , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[30] Warren B. Powell,et al. Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage Using Approximate Dynamic Programming , 2014, INFORMS J. Comput..
[31] Marc G. Bellemare,et al. Compress and Control , 2015, AAAI.
[32] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[33] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[34] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.