Nonlinear Distributional Gradient Temporal-Difference Learning
暂无分享,去创建一个
Shie Mannor | Huan Xu | Chao Qu | Shie Mannor | Huan Xu | C. Qu
[1] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[2] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[3] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[4] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.
[5] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[6] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[7] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[8] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[10] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).
[11] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[12] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[13] Louis Wehenkel,et al. Risk-aware decision making and dynamic programming , 2008 .
[14] Yee Whye Teh,et al. An Analysis of Categorical Distributional Reinforcement Learning , 2018, AISTATS.
[15] Marc G. Bellemare,et al. The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.
[16] A. Juditsky,et al. Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.
[17] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[18] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[19] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[20] David Silver,et al. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .
[21] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[22] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[23] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[24] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[25] Shie Mannor,et al. Learning the Variance of the Reward-To-Go , 2016, J. Mach. Learn. Res..
[26] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[27] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[28] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.