Neural Sequence Chunkers

This paper addresses the problem of learning tòdivide and conquer' by meaningful hierarchical adaptive decomposition of temporal sequences. This problem is relevant for time-series analysis as well as for goal-directed learning, particularily if event sequences tend to have hierarchical temporal structure. The rst neural systems for recursively chunking sequences are described. These systems are based on a principle called thèprinciple of history compression'. This principle essentially says: As long as a predictor is able to predict future environmental inputs from previous ones, no additional knowledge can be obtained by observing these inputs in reality. Only unexpected inputs deserve attention. A focus is on a class of 2-network systems which try to collapse a self-organizing (possibly multi-level) hierarchy of temporal predictors into a single recurrent network. Only those input events that were not expected by the rst recurrent net are transferred to the second recurrent net. Therefore the second net receives a reduced discription of the input history. It tries to develop internal representations for`higher-level' temporal structure. These internal representations in turn serve to create additional training signals for the rst net, thus helping the rst net to create longer and longer`chunks' for the second net. Experiments show that chunking systems can be superior to the conventional training algorithms for recurrent nets. 1 OUTLINE Section 2 motivates the search for sequence-composing systems by describing major drawbacks of`conventional' learning algorithms for recurrent networks with time-varying inputs and outputs. Section 3 describes a simple observation which is essential for the rest of this paper: It describes thèprinciple of history compression'. This principle essentially says: As long as a predictor is able to predict future environmental inputs from previous ones, no additional knowledge can be obtained by observing these inputs in reality. Only unexpected inputs deserve attention. This principle is of particular interest if typical event sequences have hierarchical temporal structure. Basic schemes for constructing sequence chunking systems based on the principle of history compression are described. Section 4 then describes on-line and o-line versions of a particular 2-network chunking system which tries to collapse a self-organizing (possibly multi-level) predictor hierarchy into a single recurrent network (the automatizer). The idea is to feed everything that is unexpected into a `higher-level' recurrent net (the chunker). Since the expected things can be derived from the unexpected things by the automatizer, the chunker is fed with a reduced description of the input history. The chunker has a …

[1]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[2]  Yann LeCun,et al.  Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .

[3]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[4]  Robert M. Farber,et al.  How Neural Nets Work , 1987, NIPS.

[5]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[6]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1988, Neural Computation.

[7]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[8]  C. Watkins Learning from delayed rewards , 1989 .

[9]  Jürgen Schmidhuber,et al.  A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[10]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[11]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[12]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[13]  Yves Chauvin,et al.  Generalization Performance of Overtrained Back-Propagation Networks , 1990, EURASIP Workshop.

[14]  Jürgen Schmidhuber,et al.  Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem , 1990 .

[15]  Jürgen Schmidhuber,et al.  Recurrent networks adjusted by adaptive critics , 1990 .

[16]  T. Sejnowski,et al.  Learning Algorithms for Networks with Internal and External Feedback , 1990 .

[17]  Jürgen Schmidhuber,et al.  Learning to generate sub-goals for action sequences , 1991 .

[18]  J. Urgen Schmidhuber Adaptive Decomposition Of Time , 1991 .

[19]  David Zipser,et al.  UNSUPERVISED DISCOVERY OF SPEECH SEGMENTS USING RECURRENT NETWORKS , 1991 .

[20]  J. Urgen Schmidhuber Learning to Control Fast-weight Memories: an Alternative to Dynamic Recurrent Networks , 1991 .

[21]  Catherine Myers Learning with Delayed Reinforcement Through Attention-Driven Buffering , 1991, Int. J. Neural Syst..