On the Worst-Case Analysis of Temporal-Difference Learning Algorithms
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[2] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[3] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[4] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[5] Philip M. Long,et al. Worst-case quadratic loss bounds for a generalization of the Widrow-Hoff rule , 1993, COLT '93.
[6] P. Dayan. The Convergence of TD(λ) for General λ , 1992, Machine Learning.
[7] Philip M. Long,et al. On-line learning of linear functions , 1991, STOC '91.