Learning Algorithms for Networks with Internal and External Feedback

This paper gives an overview of some novel algorithms for reinforcement learning in non-stationary possibly reactive environments. I have decided to describe many ideas brieey rather than going into great detail on any one idea. The paper is structured as follows: In the rst section some terminology is introduced. Then there follow ve sections, each headed by a short abstract. The second section describes the entirely locaìneural bucket brigade algorithm'. The third section applies Sutton's TD-methods to fully recurrent continually running probabilistic networks. The fourth section describes an algorithm based on system identiication and on two interacting fully recurrent`self-supervised' learning networks. The fth section describes an application of adaptive control techniques to adaptive attentive vision: It demonstrates how`selective attention' can be learned. Finally, the sixth section critisizes methods based on system identiication and adaptive critics, and describes an adaptive subgoal generator. 1 Terminology External feedback. Consider a neural network receiving inputs from a non-stationary environment and being able to produce actions that may have an innuence on the environmental state. Since the new state may cause new inputs for the network we say that there is external feedback. Internal feedback. If the network topology is cyclic, then input activations from a given time may alter the way that inputs from later times are processed. In this case there is a potential for thèrepresentation of state', or`short term memory', and we speak of internal feedback. Dynamic Learning Algorithms and Networks. A problem that requires credit assignment to past activation states is called a dynamic problem. Learning algorithms for handling dynamic problems are called dynamic learning algorithms. Learning algorithms that are not dynamic algorithms are called static algorithms. For instance, all algorithms that require settling into equilibria while the inputs have to remain stationary are considered to be static algorithms, although the settling process is a dynamic one based on internal feedback. If a given network type can be employed for dynamic problems, and if there exists a corresponding learning algorithm, then we sometimes speak of a dynamic network. The credit assignment problem. If a neural network is supposed to learn externally posed tasks then it faces Minsky's fundamental credit assignment problem: If performance is not suucient, then which component of the network at which time did in which way contribute to the failure? How should critical components change behavior to increase future performance? 1 Supervised Learning. A learning task is a …

[1]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[3]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[5]  John H. Holland,et al.  Properties of the Bucket Brigade , 1985, ICGA.

[6]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[7]  Charles W. Anderson,et al.  Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[8]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[9]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[10]  Michael I. Jordan Supervised learning and systems with excess degrees of freedom , 1988 .

[11]  R. J. Williams,et al.  On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.

[12]  M. Gherrity,et al.  A learning algorithm for analog, fully recurrent neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[13]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[14]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[15]  Frank Fallside,et al.  Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .

[16]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[17]  Jürgen Schmidhuber,et al.  A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[18]  Z. Schreter,et al.  The Neural Bucket Brigade , 1989 .

[19]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Interacting Continually Running Fully Recurrent Networks , 1990 .

[20]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[21]  Jürgen Schmidhuber,et al.  Recurrent networks adjusted by adaptive critics , 1990 .

[22]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[23]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .