Dyna, an integrated architecture for learning, planning, and reacting

Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes incorrect world models generated by learning processes. Execution is fully reactive in the sense that no planning intervenes between perception and action. Dyna relies on machine learning methods for learning from examples---these are among the basic building blocks making up the architecture---yet is not tied to any particular method. This paper briefly introduces Dyna and discusses its strengths and weaknesses with respect to other architectures.

[1]  K. J. Craik,et al.  The nature of explanation , 1944 .

[2]  W. H. F. Barnes The Nature of Explanation , 1944, Nature.

[3]  Patricia C. Goldberg,et al.  Automatic Programming , 1974, Programming Methodology.

[4]  D. Dennett Why the Law of Effect will not Go Away , 1975 .

[5]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[6]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[7]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[8]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[10]  Stuart J. Russell Execution Architectures and Compilation , 1989, IJCAI.

[11]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[12]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[13]  Long-Ji Lin,et al.  Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .

[14]  Rick L. Riolo,et al.  Lookahead planning and latent learning in a classifier system , 1991 .

[15]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[16]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .