论文信息 - Integrated Modeling and Control Based on Reinforcement Learning

Integrated Modeling and Control Based on Reinforcement Learning

This is a summary of results with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned forward model of the world. We describe and show results for two Dyna architectures, Dyna-AHC and Dyna-Q. Using a navigation task, results are shown for a simple Dyna-AHC system which simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. We show that Dyna-Q architectures (based on Watkins's Q-Iearning) are easy to adapt for use in changing environments.

Richard S. Sutton | R. Sutton

[1] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[2] Alberto Maria Segre. Proceedings of the sixth international workshop on Machine learning , 1989 .

[3] Nils J. Nilsson,et al. Artificial Intelligence , 1974, IFIP Congress.

[4] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .