A new Q(lambda) with interim forward view and Monte Carlo equivalence
暂无分享,去创建一个
Doina Precup | Richard S. Sutton | Hado van Hasselt | Ashique Rupam Mahmood | R. Sutton | Doina Precup | H. V. Hasselt | A. Mahmood
[1] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[2] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[3] Peter Dayan,et al. Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.
[4] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..
[5] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[6] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[7] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[8] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[9] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[10] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[11] Scott Sanner,et al. Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda , 2010, ICML.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[14] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[15] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[16] Richard S. Sutton,et al. True online TD(λ) , 2014, ICML 2014.
[17] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.
[18] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[19] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[20] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.