Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return
暂无分享,去创建一个
Martha White | Richard S. Sutton | Adam White | Craig Sherstan | Kenny Young | Brendan Bennett | Dylan R. Ashley
[1] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[2] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[4] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[5] Mohammad Ghavamzadeh,et al. Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.
[6] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[7] Yutaka Sakaguchi,et al. Reliability of internal prediction/estimation and its application. I. Adaptive action selection reflecting reliability of value function , 2004, Neural Networks.
[8] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[9] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[10] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[11] Shie Mannor,et al. Learning the Variance of the Reward-To-Go , 2016, J. Mach. Learn. Res..
[12] Shie Mannor,et al. Variance Adjusted Actor Critic Algorithms , 2013, ArXiv.
[13] Makoto Sato,et al. TD algorithm for the variance of return and mean-variance reinforcement learning , 2001 .
[14] Christos Dimitrakakis,et al. Ensembles for sequence learning , 2006 .
[15] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[16] Martha White,et al. A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning , 2016, AAMAS.
[17] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .