Asynchronous Stochastic Approximation and Q-Learning

We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.

[1]  V. Nollau Kushner, H. J./Clark, D. S., Stochastic Approximation Methods for Constrained and Unconstrained Systems. (Applied Mathematical Sciences 26). Berlin‐Heidelberg‐New York, Springer‐Verlag 1978. X, 261 S., 4 Abb., DM 26,40. US $ 13.20 , 1980 .

[2]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[3]  Tamer Basar,et al.  Asymptotic agreement and convergence of asynchronous stochastic algorithms , 1986, 1986 25th IEEE Conference on Decision and Control.

[4]  H. Kushner,et al.  Asymptotic properties of distributed and communication stochastic approximation algorithms , 1987 .

[5]  H. Kushner,et al.  Stochastic approximation algorithms for parallel and distributed processing , 1987 .

[6]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[7]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[8]  Andrew W. Moore,et al.  Memory-based Reinforcement Learning: Converging with Less Data and Less Real Time , 1993 .

[9]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[10]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[11]  Peter Dayan,et al.  The convergence of TD(λ) for general λ , 1992, Machine Learning.

[12]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[13]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[14]  J. Walrand,et al.  Distributed Dynamic Programming , 2022 .