Bias-corrected Q-learning to control max-operator bias in Q-learning
暂无分享,去创建一个
[1] C. Watkins. Learning from delayed rewards , 1989 .
[2] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[3] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[4] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[5] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[6] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[7] Jong-Hwan Kim,et al. Modular Q-learning based multi-agent cooperation for robot soccer , 2001, Robotics Auton. Syst..
[8] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[9] Jiang Chen,et al. An application in RoboCup combining Q-learning with adversarial planning , 2002, Proceedings of the 4th World Congress on Intelligent Control and Automation (Cat. No.02EX527).
[10] Steve Young,et al. Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning , 2002 .
[11] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[12] Jeffrey O. Kephart,et al. Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.
[13] Thore Graepel,et al. LEARNING TO FIGHT , 2004 .
[14] Cheng-Wan An,et al. Mobile robot navigation using neural Q-learning , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).
[15] Yi-Chi Wang,et al. Application of reinforcement learning for agent-based production scheduling , 2005, Eng. Appl. Artif. Intell..
[16] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[17] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[18] Vikram Krishnamurthy,et al. ${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control , 2007, IEEE Transactions on Signal Processing.
[19] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[20] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[21] Warren B. Powell,et al. An Intelligent Battery Controller Using Bias-Corrected Q-learning , 2012, AAAI.
[22] Michèle Sebag,et al. The grand challenge of computer Go , 2012, Commun. ACM.