论文信息 - On the Worst-Case Analysis of Temporal-Difference Learning Algorithms - 字舞流文

On the Worst-Case Analysis of Temporal-Difference Learning Algorithms

Manfred K. Warmuth | Robert E. Schapire | R. Schapire

[1] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[2] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[3] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[4] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[5] Philip M. Long,et al. Worst-case quadratic loss bounds for a generalization of the Widrow-Hoff rule , 1993, COLT '93.

[6] P. Dayan. The Convergence of TD(λ) for General λ , 1992, Machine Learning.

[7] Philip M. Long,et al. On-line learning of linear functions , 1991, STOC '91.