Discovering the Structure of a Reactive Environment by Exploration

Consider a robot wandering around an unfamiliar environment, performing actions and observing the consequences. The robot's task is to construct an internal model of its environment, a model that will allow it to predict the effects of its actions and to determine what sequences of actions to take to reach particular goal states. Rivest and Schapire (1987a,b; Schapire 1988) have studied this problem and have designed a symbolic algorithm to strategically explore and infer the structure of finite state environments. The heart of this algorithm is a clever representation of the environment called an update graph. We have developed a connectionist implementation of the update graph using a highly specialized network architecture. With backpropagation learning and a trivial exploration strategy choosing random actions the connectionist network can outperform the Rivest and Schapire algorithm on simple problems. Our approach has additional virtues, including the fact that the network can accommodate stochastic environments and that it suggests generalizations of the update graph representation that do not arise from a traditional, symbolic perspective.