Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons

Delayed reinforcement learning is an attractive framework for the unsupervised learning of action policies for autonomous agents. Some existing delayed reinforcement learning techniques have shown promise in simple domains. However, a number of hurdles must be passed before they are applicable to realistic problems. This paper describes one such difficulty, the input generalization problem (whereby the system must generalize to produce similar actions in similar situations) and an implemented solution, the G algorithm. This algorithm is based on recursive splitting of the state space based on statistical measures of differences in reinforcements received. Connectionist backpropagation has previously been used for input generalization in reinforcement learning. We compare the two techniques analytically and empirically. The G algorithm's sound statistical basis makes it easy to predict when it should and should not work, whereas the behavior of back-propagation is unpredictable. We found that a previous successful use of backpropagation can be explained by the linearity of the application domain. We found that in another domain, G reliably found the optimal policy, whereas none of a set of runs of backpropagation with many combinations of parameters did.

[1]  G. W. Snedecor Statistical Methods , 1964 .

[2]  Statistical methods , 1980 .

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  Charles W. Anderson,et al.  Strategy Learning with Multilayer Connectionist Representations , 1987 .

[5]  Paul E. Utgoff,et al.  ID5: An Incremental ID3 , 1987, ML Workshop.

[6]  J. F. Shepanski,et al.  Teaching Artificial Neural Systems To Drive: Manual Training Techniques For Autonomous Systems , 1988, Other Conferences.

[7]  Rodney A. Brooks,et al.  A Robot that Walks; Emergent Behaviors from a Carefully Evolved Network , 1989, Neural Computation.

[8]  C. Watkins Learning from delayed rewards , 1989 .

[9]  Leslie Pack Kaelbling,et al.  Action and planning in embedded agents , 1990, Robotics Auton. Syst..

[10]  R. Beer,et al.  Intelligence as Adaptive Behavior: An Experiment in Computational Neuroethology , 1990 .

[11]  David Chapman,et al.  Vision, instruction, and action , 1990 .

[12]  Dana H. Ballard,et al.  Active Perception and Reinforcement Learning , 1990, Neural Computation.

[13]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[14]  Jonathan H. Connell,et al.  Minimalist mobile robotics - a colony-style architecture for an artificial creature , 1990, Perspectives in artificial intelligence.

[15]  Long-Ji Lin,et al.  Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .

[16]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[17]  Marco C. Bettoni,et al.  Made-Up Minds: A Constructivist Approach to Artificial Intelligence , 1993, IEEE Expert.