Optimality of LSTD and its Relation to MC
暂无分享,去创建一个
Klaus Obermayer | Steffen Grünewälder | Sepp Hochreiter | S. Hochreiter | K. Obermayer | S. Grünewälder
[1] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[2] John N. Tsitsiklis,et al. Bias and variance in value function estimation , 2004, ICML.
[3] Andrew W. Moore,et al. Learning evaluation functions for global optimization , 1998 .
[4] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[5] Peter Dayan,et al. Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.
[6] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[7] Sailes K. Sengijpta. Fundamentals of Statistical Signal Processing: Estimation Theory , 1995 .
[8] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.
[11] M. Kendall,et al. Kendall's advanced theory of statistics , 1995 .
[12] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[13] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.