暂无分享,去创建一个
[1] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[2] Martha White,et al. Emphatic Temporal-Difference Learning , 2015, ArXiv.
[3] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[4] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[6] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[7] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[8] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[10] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[11] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[12] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.