A Recurrent Neural Network that Learns to Count

Parallel distributed processing (PDP) architectures demonstrate a potentially radical alternative to the traditional theories of language processing that are based on serial computational models. However, learning complex structural relationships in temporal data presents a serious challenge to PDP systems. For example, automata theory dictates that processing strings from a context-free language (CFL) requires a stack or counter memory device. While some PDP models have been hand-crafted to emulate such a device, it is not clear how a neural network might develop such a device when learning a CFL. This research employs standard backpropagation training techniques for a recurrent neural network (RNN) in the task of learning to predict the next character in a simple deterministic CFL (DCFL). We show that an RNN can learn to recognize the structure of a simple DCFL. We use dynamical systems theory to identify how network states reflect that structure by building counters in phase space. The work is an empirical investigation which is complementary to theoretical analyses of network capabilities, yet original in its specific configuration of dynamics involved. The application of dynamical systems theory helps us relate the simulation results to theoretical results, and the learning task enables us to highlight some issues for understanding dynamical systems that process language with counters.

[1]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[2]  M. Kutas,et al.  Brain potentials during reading reflect word expectancy and semantic association , 1984, Nature.

[3]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[4]  James L. McClelland,et al.  Mechanisms of Sentence Processing: Assigning Roles to Constituents of Sentences , 1986 .

[5]  Robert C. Berwick,et al.  Computational complexity and natural language , 1987 .

[6]  M. Coltheart,et al.  Attention and Performance XII: The Psychology of Reading , 1987 .

[7]  James L. McClelland The Case for Interactionism in Language Processing. , 1987 .

[8]  James L. McClelland,et al.  Learning Subsequential Structure in Simple Recurrent Networks , 1988, NIPS.

[9]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[10]  David Zipser,et al.  Learning Sequential Structure with the Real-Time Recurrent Learning Algorithm , 1991, Int. J. Neural Syst..

[11]  S. Wiggins Introduction to Applied Nonlinear Dynamical Systems and Chaos , 1989 .

[12]  James L. McClelland Connectionist Models of Language , 1989, IWPT.

[13]  Andreas Stolcke Learning Feature-based Semantics with Simple Recurrent Networks , 1990 .

[14]  Raymond L. Watrous,et al.  Induction of Finite-State Automata Using Second-Order Recurrent Networks , 1991, NIPS.

[15]  C. Lee Giles,et al.  Extracting and Learning an Unknown Grammar with Recurrent Neural Networks , 1991, NIPS.

[16]  Jeffrey L. Elman,et al.  A PDP Approach to Processing Center-Embedded Sentences , 1992 .

[17]  Colin Giles,et al.  Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory (cid:3) , 1992 .

[18]  Mark F. St. John,et al.  The Story Gestalt: A Model of Knowledge-Intensive Processes in Text Comprehension , 1992, Cogn. Sci..

[19]  M. Martelli Discrete Dynamical Systems and Chaos , 1992 .

[20]  Douglas S. Blank,et al.  Exploring the Symbolic/Subsymbolic Continuum: A case study of RAAM , 1992 .

[21]  Risto Miikkulainen,et al.  Script Recognition with Hierarchical Feature Maps , 1992 .

[22]  H. Siegelmann Foundations of recurrent neural networks , 1993 .

[23]  C. Lee Giles,et al.  Extraction, Insertion and Refinement of Symbolic Rules in Dynamically Driven Recurrent Neural Networks , 1993 .

[24]  Garrison W. Cottrell,et al.  Please Scroll down for Article Connection Science Learning Simple Arithmetic Procedures , 2022 .

[25]  C. Robinson Dynamical Systems: Stability, Symbolic Dynamics, and Chaos , 1994 .

[26]  J. Batali,et al.  Innate biases and critical periods: Combining evolution and learning in the acquisition of syntax , 1994 .

[27]  Garrison W. Cottrell,et al.  Phase-Space Learning , 1994, NIPS.

[28]  Fu-Sheng Tsung,et al.  Modeling dynamical systems with recurrent neural networks , 1994 .

[29]  NetworksMorten H. Christiansen,et al.  Natural Language Recursion and Recurrent Neural , 1994 .

[30]  Janet Wiles,et al.  Learning to count without a counter: A case study of dynamics and activation landscapes in recurrent networks , 1995 .

[31]  Barry L. Kalman,et al.  Tail-recursive Distributed Representations and Simple Recurrent Networks , 1995 .

[32]  John F. Kolen,et al.  Exploring the computational capabilities of recurrent neural networks , 1995 .

[33]  Peter Tiňo,et al.  Finite State Machines and Recurrent Neural Networks -- Automata and Dynamical Systems Approaches , 1995 .

[34]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[35]  Janet Wiles,et al.  Recurrent Neural Networks Can Learn to Implement Symbol-Sensitive Counting , 1997, NIPS.

[36]  Jordan B. Pollack,et al.  Analysis of Dynamical Recognizers , 1997, Neural Computation.

[37]  Cristopher Moore,et al.  Dynamical Recognizers: Real-Time Language Recognition by Analog Computers , 1998, Theor. Comput. Sci..