Neural net architectures for temporal sequence processing

I present a general taxonomy of neural net architectures for processing time-varying patterns. This taxonomy subsumes many existing architectures in the literature, and points to several promising architectures that have yet to be examined. Any architecture that processes timevarying patterns requires two conceptually distinct components: a short-term memory that holds on to relevant past events and an associator that uses the short-term memory to classify or predict. My taxonomy is based on a characterization of short-term memory models along the dimensions of form, content, and adaptability. Experiments on predicting future values of a financial time series (US dollar–Swiss franc exchange rates) are presented using several alternative memory models. The results of these experiments serve as a baseline against which more sophisticated architectures can be compared. Neural networks have proven to be a promising alternative to traditional techniques for nonlinear temporal prediction tasks (e.g., Curtiss, Brandemuehl, & Kreider, 1992; Lapedes & Farber, 1987; Weigend, Huberman, & Rumelhart, 1992). However, temporal prediction is a particularly challenging problem because conventional neural net architectures and algorithms are not well suited for patterns that vary over time. The prototypical use of neural nets is in structural pattern recognition. In such a task, a collection of features—visual, semantic, or otherwise—is presented to a network and the network must categorize the input feature pattern as belonging to one or more classes. For example, a network might be trained to classify animal species based on a set of attributes describing living creatures such as “has tail”, “lives in water”, or “is carnivorous”; or a network could be trained to recognize visual patterns over a two-dimensional pixel array as a letter in {A,B, . . . , Z}. In such tasks, the network is presented with all relevant information simultaneously. In contrast, temporal pattern recognition involves processing of patterns that evolve over time. The appropriate response at a particular point in time depends not only on the current input, but potentially all previous inputs. This is illustrated in Figure 1, which shows the basic framework for a temporal prediction problem. I assume that time is quantized into discrete steps, a sensible assumption because many time series of interest are intrinsically discrete, and continuous series can be sampled at a fixed interval. The input at time t is denoted x(t). For univariate series, this input

[1]  José Carlos Príncipe,et al.  A Theory for Neural Networks with Time Delays , 1990, NIPS.

[2]  Herz Global analysis of parallel analog networks with retarded feedback. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[3]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[4]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[5]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[6]  Giovanni Soda,et al.  Local Feedback Multilayered Networks , 1992, Neural Computation.

[7]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[8]  D. Rumelhart,et al.  Predicting sunspots and exchange rates with connectionist networks , 1991 .

[9]  ZipserDavid,et al.  A learning algorithm for continually running fully recurrent neural networks , 1989 .

[10]  Catherine Myers Learning with Delayed Reinforcement Through Attention-Driven Buffering , 1991, Int. J. Neural Syst..

[11]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[12]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[13]  Eric Wan,et al.  Finite Impulse Response Neural Networks for Autoregressive Time Series Prediction , 1993 .

[14]  Jürgen Schmidhuber,et al.  Learning Unambiguous Reduced Sequence Descriptions , 1991, NIPS.

[15]  A. Lapedes,et al.  Nonlinear Signal Processing Using Neural Networks , 1987 .

[16]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[17]  Klaus Schulten,et al.  Self-organizing maps and adaptive filters , 1991 .

[18]  Dana H. Ballard,et al.  Cortical connections and parallel processing: Structure and function , 1986, Behavioral and Brain Sciences.

[19]  Yoshua Bengio,et al.  Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge , 1989, NIPS.

[20]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[21]  John J. Hopfield,et al.  Connected-digit speaker-dependent speech recognition using a neural network with time-delayed connections , 1991, IEEE Trans. Signal Process..

[22]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[23]  D Kleinfeld,et al.  Sequential state generation by model neural networks. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Tad Hogg,et al.  A Dynamical Approach to Temporal Pattern Processing , 1987, NIPS.

[25]  Yves Chauvin,et al.  Backpropagation: theory, architectures, and applications , 1995 .

[26]  Kanter,et al.  Temporal association in asymmetric neural networks. , 1986, Physical review letters.

[27]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[28]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[29]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[30]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[31]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[32]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[33]  Josef Skrzypek,et al.  Synergy of Clustering Multiple Back Propagation Networks , 1989, NIPS.

[34]  J J Hopfield,et al.  Neural computation by concentrating information in time. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[36]  José Carlos Príncipe,et al.  The gamma model--A new neural model for temporal processing , 1992, Neural Networks.

[37]  Jürgen Schmidhuber,et al.  Continuous history compression , 1993 .

[38]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[39]  Yoshua Bengio,et al.  The problem of learning long-term dependencies in recurrent networks , 1993, IEEE International Conference on Neural Networks.

[40]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[41]  Alexander H. Waibel,et al.  The Tempo 2 Algorithm: Adjusting Time-Delays By Supervised Learning , 1990, NIPS.

[42]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[43]  Jeffrey L. Elman,et al.  Interactive processes in speech perception: the TRACE model , 1986 .

[44]  Paul Smolensky,et al.  Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1990, Artif. Intell..

[45]  Yann LeCun,et al.  Second Order Properties of Error Surfaces: Learning Time and Generalization , 1990, NIPS 1990.