论文信息 - An on-line algorithm for dynamic reinforcement learning and planning in reactive environments

An on-line algorithm for dynamic reinforcement learning and planning in reactive environments

An online learning algorithm for reinforcement learning with continually running recurrent networks in nonstationary reactive environments is described. Various kinds of reinforcement are considered as special types of input to an agent living in the environment. The agent's only goal is to maximize the amount of reinforcement received over time. Supervised learning techniques for recurrent networks serve to construct a differentiable model of the environmental dynamics which includes a model of future reinforcement. This model is used for learning goal-directed behavior in an online fashion. The possibility of using the system for planning future action sequences is investigated and this approach is compared to approaches based on temporal difference methods. A connection to metalearning (learning how to learn) is noted

Jürgen Schmidhuber | J. Schmidhuber

[1] Anthony J. Robinson,et al. Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[2] B. Widrow,et al. The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[3] Michael I. Jordan. Supervised learning and systems with excess degrees of freedom , 1988 .

[4] Jürgen Schmidhuber,et al. Recurrent networks adjusted by adaptive critics , 1990 .

[5] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[6] Frank Fallside,et al. Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .

[7] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[8] Jürgen Schmidhuber,et al. The neural bucket brigade , 1989 .