暂无分享,去创建一个
Patrick M. Pilarski | Marlos C. Machado | Richard S. Sutton | Harm van Seijen | Ashique Rupam Mahmood | R. Sutton | P. Pilarski | A. Mahmood | H. V. Seijen
[1] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[2] Manfred K. Warmuth,et al. On the worst-case analysis of temporal-difference learning algorithms , 2004, Machine Learning.
[3] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[6] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[9] B Hudgins,et al. Myoelectric signal processing for control of powered limb prostheses. , 2006, Journal of electromyography and kinesiology : official journal of the International Society of Electrophysiological Kinesiology.
[10] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[11] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[12] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[13] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[14] P. Thomas,et al. TD γ : Re-evaluating Complex Backups in Temporal Difference Learning , 2011 .
[15] Scott Niekum,et al. TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning , 2011, NIPS.
[16] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[17] Patrick M. Pilarski,et al. Adaptive artificial limbs: a real-time approach to prediction and anticipation , 2013, IEEE Robotics & Automation Magazine.
[18] Thore Graepel,et al. A Comparison of learning algorithms on the Arcade Learning Environment , 2014, ArXiv.
[19] Jacqueline S. Hebert,et al. Novel Targeted Sensory Reinnervation Technique to Restore Functional Hand Sensation After Transhumeral Amputation , 2014, IEEE Transactions on Neural Systems and Rehabilitation Engineering.
[20] Richard S. Sutton,et al. True online TD(λ) , 2014, ICML 2014.
[21] R. Sutton,et al. A new Q ( � ) with interim forward view and Monte Carlo equivalence , 2014 .
[22] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.
[23] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[24] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..
[25] Scott Niekum,et al. Policy Evaluation Using the Ω-Return , 2015, NIPS.
[26] Richard S. Sutton,et al. Off-policy learning based on weighted importance sampling with linear computational complexity , 2015, UAI.
[27] Richard S. Sutton,et al. Learning to Predict Independent of Span , 2015, ArXiv.
[28] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[29] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[30] Harm van Seijen,et al. Effective Multi-step Temporal-Difference Learning for Non-Linear Function Approximation , 2016, ArXiv.