Advances in reinforcement learning and their implications for intelligent control

The focus of this work is on control architectures that are based on reinforcement learning. A number of recent advances that have contributed to the viability of reinforcement learning approaches to intelligent control are surveyed. These advances include the formalization of the relationship between reinforcement learning and dynamic programming, the use of internal predictive models to improve learning rate, and the integration of reinforcement learning with active perception. On the basis of these advances and other results, it is concluded that control architectures base on reinforcement learning are now in a position to satisfy many of the criteria associated with intelligent control.<<ETX>>

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  S. Ullman Visual routines , 1984, Cognition.

[4]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[6]  Mark S. Boddy,et al.  An Analysis of Time-Dependent Planning , 1988, AAAI.

[7]  J. A. Franklin,et al.  Refinement of robot motor skills through reinforcement learning , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[8]  Philip E. Agre,et al.  The dynamic structure of everyday life , 1988 .

[9]  D. Ballard,et al.  A Role for Anticipation in Reactive Systems that Learn , 1989, ML.

[10]  C. Watkins Learning from delayed rewards , 1989 .

[11]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[12]  John J. Grefenstette,et al.  Incremental Learning of Control Strategies with Genetic algorithms , 1989, ML.

[13]  Paul E. Utgoff,et al.  Explaining Temporal Differences to Create Useful Concepts for Evaluating States , 1990, AAAI.

[14]  Ming Tan,et al.  Two Case Studies in Cost-Sensitive Concept Acquisition , 1990, AAAI.

[15]  David Chapman,et al.  Vision, instruction, and action , 1990 .

[16]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[17]  Dana H. Ballard,et al.  Active Perception and Reinforcement Learning , 1990, Neural Computation.

[18]  Dana H. Ballard,et al.  Learning to Perceive and Act , 1990 .

[19]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[20]  Long-Ji Lin,et al.  Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .