Beyond the One Step Greedy Approach in Reinforcement Learning
暂无分享,去创建一个
Shie Mannor | Bruno Scherrer | Yonathan Efroni | Gal Dalal | Shie Mannor | B. Scherrer | Gal Dalal | Yonathan Efroni
[1] Bruno Bouzy,et al. Monte-Carlo Go Developments , 2003, ACG.
[2] Bruno Scherrer,et al. Performance bounds for λ policy iteration and application to the game of Tetris , 2013, J. Mach. Learn. Res..
[3] Brian Sheppard,et al. World-championship-caliber Scrabble , 2002, Artif. Intell..
[4] Rémi Munos,et al. Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning , 2016, NIPS.
[5] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[6] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[7] Andrew Y. Ng,et al. Shaping and policy search in reinforcement learning , 2003 .
[8] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[9] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[10] Dimitri P. Bertsekas,et al. Lambda-Policy Iteration: A Review and a New Implementation , 2013, ArXiv.
[11] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[12] Richard S. Sutton,et al. True Online TD(lambda) , 2014, ICML.
[13] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[14] Bruno Scherrer,et al. Improved and Generalized Upper Bounds on the Complexity of Policy Iteration , 2013, Math. Oper. Res..
[15] Rémi Munos,et al. Optimistic Planning of Deterministic Systems , 2008, EWRL.
[16] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[17] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[18] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.
[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[20] Damien Ernst,et al. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies , 2015, ArXiv.
[21] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[22] Marek Petrik,et al. Biasing Approximate Dynamic Programming with a Lower Discount Factor , 2008, NIPS.
[23] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[24] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[25] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[26] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[27] B. Scherrer,et al. Performance bound for Approximate Optimistic Policy Iteration , 2010 .
[28] Rémi Munos,et al. Optimistic Planning in Markov Decision Processes Using a Generative Model , 2014, NIPS.
[29] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[30] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[31] Lucian Busoniu,et al. Optimistic planning for Markov decision processes , 2012, AISTATS.
[32] Bruno Scherrer,et al. Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris , 2007 .
[33] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[34] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.
[35] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[36] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..