论文信息 - Mean-Field Theory for Batched TD()

Mean-Field Theory for Batched TD()

A representation-independent mean-field dynamics is presented for batched TD(). The task is learning to predict the outcome of an indirectly observed absorbing Markov process. In the case of linear representations, the discrete-time deterministic iteration is an affine map whose fixed point can be expressed in closed form without the assumption of linearly independent observation vectors. Batched linear TD() is proved to converge with probability 1 for all . Theory and simulation agree on a random walk example.

Fernando J. Pineda | F. Pineda

[1] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[2] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[3] G. Pflug. Stochastic Approximation Methods for Constrained and Unconstrained Systems - Kushner, HJ.; Clark, D.S. , 1980 .

[4] P. Dayan,et al. TD(λ) converges with probability 1 , 2004, Machine Learning.

[5] Geoffrey J. Gordon. Stable Fitted Reinforcement Learning , 1995, NIPS.

[6] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[7] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[8] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .

[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[10] P. Dayan. The Convergence of TD(λ) for General λ , 1992, Machine Learning.

[11] Gene H. Golub,et al. Matrix computations (3rd ed.) , 1996 .

[12] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[13] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[14] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..