Online Planning with Lookahead Policies
暂无分享,去创建一个
[1] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[2] Mausam,et al. LRTDP Versus UCT for Online Probabilistic Planning , 2012, AAAI.
[3] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[4] Geoffrey J. Gordon,et al. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.
[5] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[6] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[7] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[10] Matthieu Geist,et al. Algorithmic Survey of Parametric Value Function Approximation , 2013, IEEE Transactions on Neural Networks and Learning Systems.
[11] Robert Givan,et al. Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.
[12] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[13] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[14] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[15] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[16] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[17] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[18] V. Bulitko,et al. Learning in Real-Time Search: A Unifying Framework , 2011, J. Artif. Intell. Res..
[19] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.
[20] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.
[21] Shie Mannor,et al. How to Combine Tree-Search Methods in Reinforcement Learning , 2018, AAAI.
[22] Craig Boutilier,et al. Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..
[23] Yishay Mansour,et al. Approximate Equivalence of Markov Decision Processes , 2003, COLT.
[24] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[25] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[26] Blai Bonet,et al. Planning with Incomplete Information as Heuristic Search in Belief Space , 2000, AIPS.
[27] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[28] Shie Mannor,et al. Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning , 2018, NeurIPS.
[29] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[30] Alexander L. Strehl,et al. PAC Reinforcement Learning Bounds for RTDP and Rand-RTDP Technical Report , 2006 .