How to Combine Tree-Search Methods in Reinforcement Learning
暂无分享,去创建一个
Shie Mannor | Bruno Scherrer | Yonathan Efroni | Gal Dalal | Shie Mannor | B. Scherrer | Gal Dalal | Yonathan Efroni
[1] Shie Mannor,et al. Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning , 2018, NIPS 2018.
[2] Andrew Tridgell,et al. TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search , 1999, ArXiv.
[3] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[4] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[5] Shie Mannor,et al. Beyond the One Step Greedy Approach in Reinforcement Learning , 2018, ICML.
[6] Tsuyoshi Murata,et al. {m , 1934, ACML.
[7] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[8] Danna Zhou,et al. d. , 1934, Microbial pathogenesis.
[9] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.
[10] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[11] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[12] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[13] Bruno Scherrer,et al. Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris , 2007 .
[14] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[15] Han Liu,et al. Feedback-Based Tree Search for Reinforcement Learning , 2018, ICML.
[16] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] Matthew Lai,et al. Giraffe: Using Deep Reinforcement Learning to Play Chess , 2015, ArXiv.
[19] Nathan R. Sturtevant,et al. Monte Carlo Tree Search with heuristic evaluations using implicit minimax backups , 2014, 2014 IEEE Conference on Computational Intelligence and Games.
[20] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .
[21] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[22] Bruno Scherrer,et al. Non-Stationary Approximate Modified Policy Iteration , 2015, ICML.
[23] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.