An analysis of model-based Interval Estimation for Markov Decision Processes

[1]  Stephen F. Smith,et al.  A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem , 2006, CP.

[2]  Lihong Li,et al.  Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.

[3]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[4]  Michael L. Littman,et al.  A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Michael L. Littman,et al.  An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[7]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[8]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[9]  Laurent El Ghaoui,et al.  Robustness in Markov Decision Problems with Uncertain Transition Matrices , 2003, NIPS.

[10]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.

[11]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[12]  E. Ordentlich,et al.  Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .

[13]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[14]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[15]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[16]  Jeremy L. Wyatt,et al.  Exploration Control in Reinforcement Learning using Optimistic Model Selection , 2001, ICML.

[17]  Jürgen Schmidhuber,et al.  Efficient model-based exploration , 1998 .

[18]  Robert Givan,et al.  Bounded Parameter Markov Decision Processes , 1997, ECP.

[19]  Stewart W. Wilson,et al.  From Animals to Animats 5. Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior , 1997 .

[20]  Claude-Nicolas Fiechter Expected Mistake Bound Model for On-Line Reinforcement Learning , 1997, ICML.

[21]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[22]  Philip W. L. Fong A Quantitative Study of Hypothesis Selection , 1995, ICML.

[23]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[24]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[25]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[26]  T. Lai Adaptive treatment allocation and the multi-armed bandit problem , 1987 .

[27]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.