论文信息 - Explaining Temporal Differences to Create Useful Concepts for Evaluating States

Explaining Temporal Differences to Create Useful Concepts for Evaluating States

We describe a technique for improving problem-solving performance by creating concepts that allow problem states to be evaluated through an efficient recognition process. A temporal-difference (TD) method is used to bootstrap a collection of useful concepts by backing up evaluations from recognized states to their predecessors. This procedure is combined with explanation- based generalization (EBG) and goal regression to use knowledge of the problem domain to help generalize the new concept definitions. This maintains the efficiency of using the concepts and accelerates the learning process in comparison to knowledge-free approaches. Also, because the learned definitions may describe negative conditions, it becomes possible to use EBG to explain why some instance is not an example of a concept. The learning technique has been elaborated for minimax gameplaying and tested on a Tic-Tat-Toe system, T2. Given only concepts defining the end-game states and constrained to a two-ply search bound, experiments show that T2 learns concepts for achieving near-perfect play. T2's total searching time, including concept recognition, is within acceptable performance limits while perfect play without the concepts requires searches taking well over 100 times longer than T2's.

[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2] Richard Waldinger,et al. Achieving several goals simultaneously , 1977 .

[3] Steven A. Vere,et al. Multilevel Counterfactuals for Generalizations of Relational Concepts and Productions , 1980, Artif. Intell..

[4] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[5] Allen Newell,et al. Some Chunks Are Expensive , 1988, ML.

[6] Richard S. Sutton,et al. Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.

[7] A. Barto,et al. Learning and Sequential Decision Making , 1989 .

[8] Milind Tambe,et al. Eliminating Expensive Chunks by Restricting Expressiveness , 1989, IJCAI.

[9] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[10] Steven Minton,et al. Quantitative Results Concerning the Utility of Explanation-based Learning , 1988, Artif. Intell..