Analytical Mean Squared Error Curves for Temporal Difference Learning
暂无分享,去创建一个
[1] W. Wasow. A note on the inversion of matrices by random walks , 1952 .
[2] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[3] Bernard Widrow,et al. Adaptive Signal Processing , 1985 .
[4] S. Thomas Alexander,et al. Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.
[5] C. Watkins. Learning from delayed rewards , 1989 .
[6] James A. Bucklew,et al. Large Deviation Techniques in Decision, Simulation, and Estimation , 1990 .
[7] G. Parmigiani. Large Deviation Techniques in Decision, Simulation and Estimation , 1992 .
[8] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..
[9] Andrew G. Barto,et al. Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.
[10] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[11] David Haussler,et al. Rigorous Learning Curve Bounds from Statistical Mechanics , 1994, COLT.
[12] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[13] Lawrence K. Saul,et al. Learning curve bounds for a Markov decision process with undiscounted rewards , 1996, COLT '96.
[14] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[15] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[16] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[17] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.