暂无分享,去创建一个
Richard S. Sutton | Roshan Shariff | Niko Yasui | Abhishek Naik | R. Sutton | R. Shariff | Abhishek Naik | Niko Yasui
[1] L. J. Comrie,et al. Mathematical Tables and Other Aids to Computation. , 1946 .
[2] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[6] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[7] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[8] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[9] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] E. Feinberg,et al. Examples Concerning Abelian and Cesaro Limits , 2013, 1310.2482.
[12] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[13] Nicholas Denis. Issues concerning realizability of Blackwell optimal policies in reinforcement learning , 2019, ArXiv.
[14] Yoshua Bengio,et al. Hyperbolic Discounting and Learning over Multiple Horizons , 2019, ArXiv.