TD(λ) Converges with Probability 1
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] Tamio Shimizu,et al. A Stochastic Approximation Method for Optimization Problems , 1969, Journal of the ACM.
[3] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[4] V. Nollau. Kushner, H. J./Clark, D. S., Stochastic Approximation Methods for Constrained and Unconstrained Systems. (Applied Mathematical Sciences 26). Berlin‐Heidelberg‐New York, Springer‐Verlag 1978. X, 261 S., 4 Abb., DM 26,40. US $ 13.20 , 1980 .
[5] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[6] Harold J. Kushner,et al. Approximation and Weak Convergence Methods for Random Processes , 1984 .
[7] Bruce E. Hajek,et al. Review of 'Approximation and Weak Convergence Methods for Random Processes, with Applications to Stochastic Systems Theory' (Kushner, H.J.; 1984) , 1985, IEEE Transactions on Information Theory.
[8] C. Watkins. Learning from delayed rewards , 1989 .
[9] Halbert White,et al. Recursive M-estimation, nonlinear regression and neural network learning with dependent observations , 1990 .
[10] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[11] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[12] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[13] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.
[14] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[15] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[16] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.