论文信息 - KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search

KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search

In this paper we present TDLeaf(lambda), a variation on the TD(lambda) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program ``KnightCap'' used TDLeaf(lambda) to learn its evaluation function while playing on the Free Internet Chess Server (FICS, fics.onenet.net). The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play.

[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[3] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[4] Donald F. Beal,et al. Learning Piece Values Using Temporal Differences , 1997, J. Int. Comput. Games Assoc..

[5] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .

[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[8] Sebastian Thrun,et al. Learning to Play the Game of Chess , 1994, NIPS.

[9] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.

[10] Jordan B. Pollack,et al. Coevolution of a Backgammon Player , 1996 .

[11] Jonathan Schaeffer,et al. Best-First Fixed-Depth Minimax Algorithms , 1996, J. Int. Comput. Games Assoc..