Mean-Field Theory for Batched TD()
暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[2] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[3] G. Pflug. Stochastic Approximation Methods for Constrained and Unconstrained Systems - Kushner, HJ.; Clark, D.S. , 1980 .
[4] P. Dayan,et al. TD(λ) converges with probability 1 , 2004, Machine Learning.
[5] Geoffrey J. Gordon. Stable Fitted Reinforcement Learning , 1995, NIPS.
[6] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[7] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[8] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[10] P. Dayan. The Convergence of TD(λ) for General λ , 1992, Machine Learning.
[11] Gene H. Golub,et al. Matrix computations (3rd ed.) , 1996 .
[12] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[13] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[14] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..