TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration
暂无分享,去创建一个
[1] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[2] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[3] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[4] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[5] Michael O. Duff,et al. Design for an Optimal Probe , 2003, ICML.
[6] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[7] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[8] Gerald DeJong,et al. Active reinforcement learning , 2008, ICML '08.
[9] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[10] Andrew G. Barto,et al. An intrinsic reward mechanism for efficient exploration , 2006, ICML.
[11] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[12] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .