True online TD(λ)
暂无分享,去创建一个
[1] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[2] Manfred K. Warmuth,et al. On the worst-case analysis of temporal-difference learning algorithms , 2004, Machine Learning.
[3] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[4] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..
[5] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[6] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[7] Richard S. Sutton,et al. True Online TD(lambda) , 2014, ICML.
[8] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[9] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[10] Csaba Szepesv. Algorithms for Reinforcement Learning , 2010 .
[11] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[12] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.