Near-Optimal Reinforcement Learning in Polynomial Time
暂无分享,去创建一个
[1] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[2] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[3] Richard S. Sutton,et al. Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.
[4] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[5] A. Jalali,et al. A distributed asynchronous algorithm for expected average cost dynamic programming , 1990, 29th IEEE Conference on Decision and Control.
[6] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .
[7] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[8] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .
[9] Donald A. Sofge,et al. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .
[10] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[11] Alistair Sinclair,et al. Algorithms for Random Generation and Counting: A Markov Chain Approach , 1993, Progress in Theoretical Computer Science.
[12] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.
[13] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[14] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[16] C. Fiechter. Eecient Reinforcement Learning , 1994 .
[17] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[18] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[19] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[20] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[21] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[22] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[23] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[24] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[25] Manfred K. Warmuth,et al. On the Worst-Case Analysis of Temporal-Difference Learning Algorithms , 2005, Machine Learning.
[26] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[27] Lawrence K. Saul,et al. Learning curve bounds for a Markov decision process with undiscounted rewards , 1996, COLT '96.
[28] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[29] Claude-Nicolas Fiechter. Expected Mistake Bound Model for On-Line Reinforcement Learning , 1997, ICML.
[30] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[31] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[32] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[33] Peter Dayan,et al. Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.
[34] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[35] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[36] Manfred K. Warmuth,et al. On the Worst-Case Analysis of Temporal-Difference Learning Algorithms , 2005, Machine Learning.
[37] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[38] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[39] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[40] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.