WATCHING THE TRANSIENTS : VIEWING A SIMPLE RECURRENT NETWORK AS A LIMITED COUNTER

Researchers in analog computation theory have shown that a recurrent neural network (RNN) can be built to simulate a Turing machine (Pollack, 1987b; Siegelmann & Sontag, 1995). Recently, we showed that it is possible to train RNNs which implement some aspects of analog computation theory—namely a network can develop trajectories that count symbols (Wiles & Elman, 1995). But what are the implications for psychological models of sequence processing based on RNNs? As a first step toward answering this question, we investigate an RNN in a psycholinguistically motivated task: predict the next letter in a simple Deterministic Context Free Language that has one level of center-embedding. We demonstrate how the network develops simple coordination between trajectories that enable it to perform limited counting, and in some cases generalize to longer strings. We geometrically identify and analyze several properties relevant for this task, including information loss that results from approaching attractors, divergence in phase space that is used to split states, and difficulty in learning temporal dependencies when the input-output probabilities overlap for different input symbols.

[1]  J. Elman Distributed Representations, Simple Recurrent Networks, And Grammatical Structure , 1991 .

[2]  Stuart M. Shieber,et al.  Foundational issues in natural language processing , 1991 .

[3]  Cristopher Moore,et al.  Dynamical Recognizers: Real-Time Language Recognition by Analog Computers , 1998, Theor. Comput. Sci..

[4]  John F. Kolen,et al.  Exploring the computational capabilities of recurrent neural networks , 1995 .

[5]  Janet Wiles,et al.  Recurrent Neural Networks Can Learn to Implement Symbol-Sensitive Counting , 1997, NIPS.

[6]  Mark F. St. John,et al.  The Story Gestalt: A Model of Knowledge-Intensive Processes in Text Comprehension , 1992, Cogn. Sci..

[7]  Kurt Hornik,et al.  A Convergence Result for Learning in Recurrent Neural Networks , 1994, Neural Computation.

[8]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[9]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[10]  Marshall C. Yovits,et al.  Ohio State University , 1974, SGAR.

[11]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[12]  S. Smale,et al.  On a theory of computation and complexity over the real numbers; np-completeness , 1989 .

[13]  Pekka Orponen,et al.  On the Effect of Analog Noise in Discrete-Time Analog Computations , 1996, Neural Computation.

[14]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[15]  Andrew S. Noetzel,et al.  Sequence Recognition with Recurrent Neural Networks , 1993 .

[16]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[17]  James L. McClelland,et al.  Graded state machines: the representation of temporal contingencies in feedback networks , 1995 .

[18]  Jeffrey L. Elman,et al.  A PDP Approach to Processing Center-Embedded Sentences , 1992 .

[19]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[20]  Walter S. Stolz,et al.  A study of the ability to decode grammatically novel sentences , 1967 .

[21]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[22]  Peter Tiňo,et al.  Finite State Machines and Recurrent Neural Networks -- Automata and Dynamical Systems Approaches , 1995 .

[23]  Jordan B. Pollack,et al.  Analysis of Dynamical Recognizers , 1997, Neural Computation.

[24]  Arnold L. Rosenberg,et al.  Real-Time Definable Languages , 1967, JACM.

[25]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[26]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[27]  C. Lee Giles,et al.  Extracting and Learning an Unknown Grammar with Recurrent Neural Networks , 1991, NIPS.

[28]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[29]  Johnson Murdoch Hart Formal properties of local-adjunct languages (lal's). , 1972 .

[30]  Nick Chater,et al.  Toward a connectionist model of recursion in human linguistic performance , 1999 .