论文信息 - TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration

TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration

We study the problem of finding efficient exploration policies for the case in which an agent is momentarily not concerned with exploiting, and instead tries to compute a policy for later use. We first formally define the Optimal Exploration Problem as one of sequential sampling and show that its solutions correspond to paths of minimum expected length in the space of policies. We derive a model-free, local linear approximation to such solutions and use it to construct efficient exploration policies. We compare our model-free approach to other exploration techniques, including one with the best known PAC bounds, and show that ours is both based on a well-defined optimization problem and empirically efficient.

Bruno Castro da Silva | Andrew G. Barto

[1] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[2] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[3] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[4] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.

[5] Michael O. Duff,et al. Design for an Optimal Probe , 2003, ICML.

[6] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[7] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[8] Gerald DeJong,et al. Active reinforcement learning , 2008, ICML '08.

[9] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[10] Andrew G. Barto,et al. An intrinsic reward mechanism for efficient exploration , 2006, ICML.

[11] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.

[12] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .