Learning Unambiguous Reduced Sequence Descriptions

Do you want your neural net algorithm to learn sequences? Do not limit yourself to conventional gradient descent (or approximations thereof). Instead, use your sequence learning algorithm (any will do) to implement the following method for history compression. No matter what your final goals are, train a network to predict its next input from the previous ones. Since only unpredictable inputs convey new information, ignore all predictable inputs but let all unexpected inputs (plus information about the time step at which they occurred) become inputs to a higher-level network of the same kind (working on a slower, self-adjusting time scale). Go on building a hierarchy of such networks. This principle reduces the descriptions of event sequences without loss of information, thus easing supervised or reinforcement learning tasks. Alternatively, you may use two recurrent networks to collapse a multi-level predictor hierarchy into a single recurrent net. Experiments show that systems based on these principles can require less computation per time step and many fewer training sequences than conventional training algorithms for recurrent nets. Finally you can modify the above method such that predictability is not defined in a yes-or-no fashion but in a continuous fashion.

[1]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[2]  Fernando J. Pineda,et al.  Time Dependent Adaptive Neural Networks , 1989, NIPS.

[3]  M. Gherrity,et al.  A learning algorithm for analog, fully recurrent neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[4]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[5]  Alexander H. Waibel,et al.  The Tempo 2 Algorithm: Adjusting Time-Delays By Supervised Learning , 1990, NIPS.

[6]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[7]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[8]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[9]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[10]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[11]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[12]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[13]  Jürgen Schmidhuber,et al.  Continuous history compression , 1993 .

[14]  Jürgen Schmidhuber,et al.  Recurrent networks adjusted by adaptive critics , 1990 .

[15]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[16]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[17]  Jürgen Schmidhuber,et al.  A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[18]  Catherine Myers Learning with Delayed Reinforcement Through Attention-Driven Buffering , 1991, Int. J. Neural Syst..

[19]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[20]  J. Urgen Schmidhuber Adaptive Decomposition Of Time , 1991 .