A Convergent Reinforcement Learning Algorithm in the Continuous Case Based on a Finite Difference Method

In this paper, we propose a convergent Reinforcement Learning algorithm for solving optimal control problems for which the state space and the time are continuous variables. The problem of computing a good approximation of the value function, which is essential because this provides the optimal control, is a difficult task in the continuous case. Indeed, as it has been pointed out by several authors, the use of parameterized functions such as neural networks for approximating the value function may produce very bad results and even diverge. In fact, we show that classical algorithms, like Q-learning, used with a simple look-up table built on a regular grid, may fail to converge. The main reason is that the discretization of the state space implies a lost of the Markov property even for deterministic continuous processes. We propose to approximate the value function with a convergent numerical scheme based on a Finite Difference approximation of the Hamilton-Jacobi-Bellman equation. Then we present a model-free reinforcement learning algorithrn, called Finite Difference Reinforcement Learning, and prove its convergence to the value function of the continuous problem.

[1]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[2]  G. Barles,et al.  Convergence of approximation schemes for fully nonlinear second order equations , 1990, 29th IEEE Conference on Decision and Control.

[3]  R. Emi Munos,et al.  A Convergent Reinforcement Learning Algorithm in the Continuous Case : the Finite-element Reinforcement Learning , 1997 .

[4]  G. Barles,et al.  Comparison principle for dirichlet-type Hamilton-Jacobi equations and singular perturbations of degenerated elliptic equations , 1990 .

[5]  G. Barles,et al.  Convergence of approximation schemes for fully nonlinear second order equations , 1991 .

[6]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[7]  Rémi Munos,et al.  A General Convergence Method for Reinforcement Learning in the Continuous Case , 1998, ECML.

[8]  P. Lions,et al.  User’s guide to viscosity solutions of second order partial differential equations , 1992, math/9207212.

[9]  G. Barles Solutions de viscosité des équations de Hamilton-Jacobi , 1994 .

[10]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[11]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[12]  W. Fleming,et al.  Controlled Markov processes and viscosity solutions , 1992 .

[13]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[14]  Marianne Akian Méthodes multigrilles en contrôle stochastique , 1990 .

[15]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[16]  Rajesh Sharma,et al.  Asymptotic analysis , 1986 .

[17]  Rémi Munos,et al.  A Convergent Reinforcement Learning Algorithm in the Continuous Case: The Finite-Element Reinforcement Learning , 1996, ICML.

[18]  R. Lathe Phd by thesis , 1988, Nature.

[19]  M. James Controlled markov processes and viscosity solutions , 1994 .