暂无分享,去创建一个
Shie Mannor | Timothy A. Mann | Todd Hester | Hugo Penedones | Shie Mannor | Todd Hester | Hugo Penedones
[1] Bruno Scherrer,et al. Rate of Convergence and Error Bounds for LSTD(λ) , 2014, ICML 2015.
[2] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[3] Scott Sanner,et al. Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda , 2010, ICML.
[4] Guy Shani,et al. Evaluating Recommendation Systems , 2011, Recommender Systems Handbook.
[5] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[6] Martha White,et al. A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning , 2016, AAMAS.
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] H. He,et al. Efficient Reinforcement Learning Using Recursive Least-Squares Methods , 2011, J. Artif. Intell. Res..
[9] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[10] Scott Niekum,et al. Policy Evaluation Using the Ω-Return , 2015, NIPS.
[11] Klaus-Robert Müller,et al. Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] J. Sherman,et al. Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix , 1950 .
[14] P. Thomas,et al. TD γ : Re-evaluating Complex Backups in Temporal Difference Learning , 2011 .
[15] Scott Niekum,et al. TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning , 2011, NIPS.
[16] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[17] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..