暂无分享,去创建一个
[1] Tian Tian,et al. MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments , 2019 .
[2] Paul Wagner,et al. Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result , 2013, NIPS.
[3] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[4] Craig Boutilier,et al. Non-delusional Q-learning and value-iteration , 2018, NeurIPS.
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] James Bergstra,et al. Autoregressive Policies for Continuous Control Deep Reinforcement Learning , 2019, IJCAI.
[8] Theodore J. Perkins,et al. On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains , 2002, ICML.
[9] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[10] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[11] Jurgen Schmidhuber,et al. Training Agents using Upside-Down Reinforcement Learning , 2019, ArXiv.
[12] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[13] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[14] Nicolas Le Roux,et al. The Value Function Polytope in Reinforcement Learning , 2019, ICML.
[15] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[16] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[17] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[18] Matthieu Geist,et al. Deep Conservative Policy Iteration , 2019, AAAI.
[19] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[20] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[21] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[22] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[23] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[24] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[25] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[26] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.