暂无分享,去创建一个
[1] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[2] Richard S. Sutton,et al. A First Empirical Study of Emphatic Temporal Difference Learning , 2017, ArXiv.
[3] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[4] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[6] Martha White,et al. Online Off-policy Prediction , 2018, ArXiv.
[7] Dimitri P. Bertsekas,et al. A Counterexample to Temporal Differences Learning , 1995, Neural Computation.
[8] Martha White,et al. Emphatic Temporal-Difference Learning , 2015, ArXiv.