论文信息 - Error Reducing Sampling in Reinforcement Learning

Error Reducing Sampling in Reinforcement Learning

In reinforcement learning, an agent collects information interacting with an environment and uses it to derive a behavior. This paper focuses on efficient sampling; that is, the problem of choosing the interaction samples so that the corresponding behavior tends quickly to the optimal behavior. Our main result is a sensitivity analysis relating the choice of sampling any state-action pair to the decrease of an error bound on the optimal solution. We derive two new model-based algorithms. Simulations demonstrate a quicker convergence (in the sense of the number of samples) of the value function to the real optimal value function.

Shie Mannor | Bruno Scherrer

[1] Rémi Munos. Efficient Resources Allocation for Markov Decision Processes , 2001, NIPS.

[2] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[3] U. Rieder,et al. Markov Decision Processes , 2010 .

[4] Shie Mannor,et al. Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.

[5] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[6] Andrew W. Moore,et al. Rates of Convergence for Variable Resolution Schemes in Optimal Control , 2000, ICML.

[7] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[8] Shlomo Zilberstein,et al. Planetary Rover Control as a Markov Decision Process , 2002 .

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[11] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[12] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[13] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[14] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[15] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[16] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .