Prediction in Intelligence: An Empirical Comparison of Off-policy Algorithms on Robots
暂无分享,去创建一个
[1] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[2] A. Preliminaries. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016 .
[3] Adam M White,et al. DEVELOPING A PREDICTIVE APPROACH TO KNOWLEDGE , 2015 .
[4] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[5] R. Sutton. The Grand Challenge of Predictive Empirical Abstract Knowledge , 2009 .
[6] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[7] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[8] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[9] Richard S. Sutton,et al. Multi-step Off-policy Learning Without Importance Sampling Ratios , 2017, ArXiv.
[10] Pascal Vincent,et al. Convergent Tree-Backup and Retrace with Function Approximation , 2017, ICML.
[11] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[12] Martha White,et al. Investigating Practical Linear Temporal Difference Learning , 2016, AAMAS.
[13] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[14] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[15] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[16] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[17] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[18] Richard S. Sutton,et al. Temporal-Difference Networks , 2004, NIPS.
[19] Martha White,et al. Online Off-policy Prediction , 2018, ArXiv.
[20] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[21] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[22] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[25] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[26] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[27] Shie Mannor,et al. Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis , 2015, AAAI.
[28] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[29] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[30] Richard S. Sutton,et al. TD(λ) networks: temporal-difference networks with eligibility traces , 2005, ICML.
[31] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[32] Leah Hackman. University of Alberta Faster Gradient-td Algorithms , 2012 .
[33] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[34] Romain Laroche,et al. Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.
[35] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.
[36] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[37] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[38] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[39] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[40] Leah M Hackman,et al. Faster Gradient-TD Algorithms , 2013 .
[41] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).