论文信息 - An analysis of model-based Interval Estimation for Markov Decision Processes - 字舞流文

An analysis of model-based Interval Estimation for Markov Decision Processes

Michael L. Littman | Alexander L. Strehl | M. Littman | A. Strehl | A. L. Strehl

[1] Stephen F. Smith,et al. A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem , 2006, CP.

[2] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.

[3] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.

[4] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6] Michael L. Littman,et al. An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[7] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[8] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[9] Laurent El Ghaoui,et al. Robustness in Markov Decision Problems with Uncertain Transition Matrices , 2003, NIPS.

[10] Shie Mannor,et al. Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.

[11] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .

[12] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .

[13] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[14] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[15] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[16] Jeremy L. Wyatt,et al. Exploration Control in Reinforcement Learning using Optimistic Model Selection , 2001, ICML.

[17] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .

[18] Robert Givan,et al. Bounded Parameter Markov Decision Processes , 1997, ECP.

[19] Stewart W. Wilson,et al. From Animals to Animats 5. Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior , 1997 .

[20] Claude-Nicolas Fiechter. Expected Mistake Bound Model for On-Line Reinforcement Learning , 1997, ICML.

[21] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[22] Philip W. L. Fong. A Quantitative Study of Hypothesis Selection , 1995, ICML.

[23] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[24] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .

[25] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[26] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .

[27] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.