Model-based reinforcement learning with nearly tight exploration complexity bounds
暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[2] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[3] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[4] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[5] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[6] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[7] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[8] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[9] András Lörincz,et al. The many faces of optimism: a unifying approach , 2008, ICML '08.
[10] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .