论文信息 - Adaptive Choice of Grid and Time in Reinforcement Learning

Adaptive Choice of Grid and Time in Reinforcement Learning

We propose local error estimates together with algorithms for adaptive a-posteriori grid and time refinement in reinforcement learning. We consider a deterministic system with continuous state and time with infinite horizon discounted cost functional. For grid refinement we follow the procedure of numerical methods for the Bellman-equation. For time refinement we propose a new criterion, based on consistency estimates of discrete solutions of the Bellman-equation. We demonstrate, that an optimal ratio of time to space discretization is crucial for optimal learning rates and accuracy of the approximate optimal value function.

Stephan Pareigis | Stephan Pareigis

[1] M. Falcone. A numerical approach to the infinite horizon problem of deterministic control theory , 1987 .

[2] Eberhard Bänsch,et al. Local mesh refinement in 2 and 3 dimensions , 1991, IMPACT Comput. Sci. Eng..

[3] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[4] Stephan Pareigis,et al. Lernen der Lösung der Bellman-Gleichung durch Beobachtung von kontinuierlichen Prozessen , 1996 .

[5] Stephan Pareigis,et al. Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes , 1996, NIPS.

[6] L. Grüne. An adaptive grid scheme for the discrete Hamilton-Jacobi-Bellman equation , 1997 .