论文信息 - Convergent Combinations of Reinforcement Learning with Linear Function Approximation

Convergent Combinations of Reinforcement Learning with Linear Function Approximation

Convergence for iterative reinforcement learning algorithms like TD(0) depends on the sampling strategy for the transitions. However, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields sufficient conditions of convergence for combinations of reinforcement learning algorithms and linear function approximation. This allows to analyse if a certain reinforcement learning algorithm and a certain function approximator are compatible. For the combination of the residual gradient algorithm with grid-based linear' interpolation we show that there exists a universal constant learning rate such that the iteration converges independently of the concrete transition data.

Artur Merke | Ralf Schoknecht | Ralf Schoknecht | A. Merke

[1] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[2] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.

[3] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[4] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.

[5] Anne Greenbaum,et al. Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.

[6] Stephan Pareigis,et al. Adaptive Choice of Grid and Time in Reinforcement Learning , 1997, NIPS.

[7] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[8] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.

[9] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[10] Artur Merke,et al. A Necessary Condition of Convergence for Reinforcement Learning with Function Approximation , 2002, ICML.

[11] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.