暂无分享,去创建一个
[1] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Lihong Li,et al. Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.
[4] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[5] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[6] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[7] A. PrashanthL.. Policy Gradients for CVaR-Constrained MDPs , 2014, ALT.
[8] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[10] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[11] Shie Mannor,et al. Distributional Policy Optimization: An Alternative Approach for Continuous Control , 2019, NeurIPS.
[12] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[13] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[14] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[15] Alessandro Lazaric,et al. Multi-Bandit Best Arm Identification , 2011, NIPS.
[16] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[17] Kenneth O. Stanley,et al. On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent , 2017, ArXiv.
[18] Kenneth O. Stanley,et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.
[19] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[20] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[21] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[22] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[23] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.
[24] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[25] M. T. Wasan. Stochastic Approximation , 1969 .
[26] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[27] B. L. Welch. The generalisation of student's problems when several different population variances are involved. , 1947, Biometrika.
[28] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[29] L. A. Prashanth. Policy Gradients for CVaR-Constrained MDPs , 2014, ALT 2014.
[30] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[31] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[32] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[33] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[34] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[35] Shie Mannor,et al. A Nonparametric Sequential Test for Online Randomized Experiments , 2016, WWW.
[36] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.