Combining online and offline knowledge in UCT
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[2] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.
[3] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[4] Michael Buro,et al. From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.
[5] Andrew Tridgell,et al. Experiments in Parameter Learning Using Temporal Differences , 1998, J. Int. Comput. Games Assoc..
[6] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[7] Jonathan Schaeffer,et al. Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.
[8] Markus Enzenberger,et al. Evaluation in Go by a Neural Network using Soft Segmentation , 2003, ACG.
[9] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[12] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[13] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[14] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[15] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.
[16] Sylvain Gelly,et al. Modifications of UCT and sequence-like simulations for Monte-Carlo Go , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.