论文信息 - Discovering the Structure of a Reactive Environment by Exploration

Discovering the Structure of a Reactive Environment by Exploration

Consider a robot wandering around an unfamiliar environment, performing actions and observing the consequences. The robot's task is to construct an internal model of its environment, a model that will allow it to predict the effects of its actions and to determine what sequences of actions to take to reach particular goal states. Rivest and Schapire (1987a,b; Schapire 1988) have studied this problem and have designed a symbolic algorithm to strategically explore and infer the structure of finite state environments. The heart of this algorithm is a clever representation of the environment called an update graph. We have developed a connectionist implementation of the update graph using a highly specialized network architecture. With backpropagation learning and a trivial exploration strategy choosing random actions the connectionist network can outperform the Rivest and Schapire algorithm on simple problems. Our approach has additional virtues, including the fact that the network can accommodate stochastic environments and that it suggests generalizations of the update graph representation that do not arise from a traditional, symbolic perspective.

Michael C. Mozer | Jonathan Bachrach | M. Mozer | J. Bachrach

[1] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[2] Ronald L. Rivest,et al. Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[3] Robert E. Schapire,et al. A new approach to unsupervised learning in deterministic environments , 1990 .

[4] James L. McClelland,et al. Learning Subsequential Structure in Simple Recurrent Networks , 1988, NIPS.

[5] John S. Bridle,et al. Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[6] David A. Cohn,et al. Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[7] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[8] Ronald J. Williams,et al. Gradient-based learning algorithms for recurrent connectionist networks , 1990 .