暂无分享,去创建一个
[1] Pascal Poupart,et al. Bayesian Reinforcement Learning , 2010, Encyclopedia of Machine Learning.
[2] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[3] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[4] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[5] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[6] Xin Yao,et al. Increasingly Cautious Optimism for Practical PAC-MDP Exploration , 2015, IJCAI.
[7] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.
[8] Ronald Ortner,et al. Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.
[9] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[10] Amir Massoud Farahmand,et al. Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.
[11] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[12] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[13] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[14] Michael L. Littman,et al. An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.
[15] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[16] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[17] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[18] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.
[19] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[20] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[21] Shie Mannor,et al. How hard is my MDP?" The distribution-norm to the rescue" , 2014, NIPS.
[22] Tor Lattimore,et al. Near-optimal PAC bounds for discounted MDPs , 2014, Theor. Comput. Sci..
[23] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[24] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[25] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[26] Lihong Li,et al. Sample Complexity Bounds of Exploration , 2012, Reinforcement Learning.
[27] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[28] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[29] Shimon Whiteson,et al. V-MAX: tempered optimism for better PAC reinforcement learning , 2012, AAMAS.
[30] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[31] G. Oehlert. A note on the delta method , 1992 .
[32] Claudia Perlich. Learning Curves in Machine Learning , 2017, Encyclopedia of Machine Learning and Data Mining.
[33] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.