暂无分享,去创建一个
Shie Mannor | Mohammad Ghavamzadeh | Yonathan Efroni | Shie Mannor | M. Ghavamzadeh | Yonathan Efroni
[1] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[2] Yishay Mansour,et al. Approximate Equivalence of Markov Decision Processes , 2003, COLT.
[3] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[4] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[5] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[6] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[7] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[8] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[9] V. Bulitko,et al. Learning in Real-Time Search: A Unifying Framework , 2011, J. Artif. Intell. Res..
[10] Shie Mannor,et al. Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning , 2018, NeurIPS.
[11] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[12] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[13] Shie Mannor,et al. How to Combine Tree-Search Methods in Reinforcement Learning , 2018, AAAI.
[14] Matthieu Geist,et al. Algorithmic Survey of Parametric Value Function Approximation , 2013, IEEE Transactions on Neural Networks and Learning Systems.
[15] Blai Bonet,et al. Planning with Incomplete Information as Heuristic Search in Belief Space , 2000, AIPS.
[16] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[17] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[18] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[19] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[20] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[21] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[22] Robert Givan,et al. Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.
[23] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[24] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[25] Craig Boutilier,et al. Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..