论文信息 - KnightCap : A chess program that learns by combining TD ( ) with game-tree search

KnightCap : A chess program that learns by combining TD ( ) with game-tree search

In this paper we present TDLeaf( ), a variation on the TD( ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program “KnightCap” used TDLeaf( ) to learn its evaluation function while playing on the Free Internet Chess Server (FICS, fics.onenet.net). The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play.

Jonathan Baxter

[1] Donald F. Beal,et al. Learning Piece Values Using Temporal Differences , 1997, J. Int. Comput. Games Assoc..

[2] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[3] Jonathan Schaeffer,et al. Best-First Fixed-Depth Minimax Algorithms , 1996, J. Int. Comput. Games Assoc..

[4] Jordan B. Pollack,et al. Coevolution of a Backgammon Player , 1996 .

[5] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[6] Sebastian Thrun,et al. Learning to Play the Game of Chess , 1994, NIPS.

[7] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.

[8] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..