论文信息 - Learning to generate subgoals for action sequences

Learning to generate subgoals for action sequences

Summary form only given. None of the existing learning algorithms for neural networks in time-varying environments addresses the problems of learning to 'divide and conquer'. Algorithms based on pure gradient descent or on adaptive critic methods are not suitable for dynamic control problems with long time lags between actions and consequences, and that there is a need for algorithms that perform 'compositional learning'. The author discusses a system which solves at least one problem associated with compositional learning. The system learns to generate subgoals. This is done with the help of 'time-bridging' adaptive models that predict the effects of the system's subprograms. An experiment on obstacle avoidance in a two-dimensional environment illustrates the approach.<<ETX>>

Jürgen Schmidhuber | J. Schmidhuber

[1] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[2] T. Sejnowski,et al. Learning Algorithms for Networks with Internal and External Feedback , 1990 .

[3] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[4] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[5] Jürgen Schmidhuber,et al. A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[6] J. Urgen Schmidhuber. Adaptive Decomposition Of Time , 1991 .

[7] B. Widrow,et al. The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[8] Frank Fallside,et al. Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .

[9] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[10] Jürgen Schmidhuber,et al. Recurrent networks adjusted by adaptive critics , 1990 .

[11] Michael I. Jordan. Supervised learning and systems with excess degrees of freedom , 1988 .

[12] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..