We explore a network architecture introduced by Elman (1988) for predicting successive elements of a sequence. The network uses the pattern of activation over a set of hidden units from time-step t1, together with element t, to predict element t 1. When the network is trained with strings from a particular finite-state grammar, it can learn to be a perfect finite-state recognizer for the grammar. When the network has a minimal number of hidden units, patterns on the hidden units come to correspond to the nodes of the grammar, although this correspondence is not necessary for the network to act as a perfect finite-state recognizer. We explore the conditions under which the network can carry information about distant sequential contingencies across intervening elements. Such information is maintained with relative ease if it is relevant at each intermediate step; it tends to be lost when intervening elements do not depend on it. At first glance this may suggest that such networks are not relevant to natural language, in which dependencies may span indefinite distances. However, embeddings in natural language are not completely independent of earlier information. The final simulation shows that long distance sequential contingencies can be encoded by the network even if only subtle statistical properties of embedded strings depend on the early information.
[1]
James L. McClelland.
The Case for Interactionism in Language Processing.
,
1987
.
[2]
Jing Peng,et al.
An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories
,
1990,
Neural Computation.
[3]
D. Rumelhart.
Learning internal representations by back-propagating errors
,
1986
.
[4]
James L. McClelland,et al.
Learning Subsequential Structure in Simple Recurrent Networks
,
1988,
NIPS.
[5]
Ronald J. Williams,et al.
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks
,
1989,
Neural Computation.
[6]
Terrence J. Sejnowski,et al.
NETtalk: a parallel network that learns to read aloud
,
1988
.
[7]
Geoffrey E. Hinton,et al.
Learning representations by back-propagating errors
,
1986,
Nature.
[8]
Jeffrey L. Elman,et al.
Finding Structure in Time
,
1990,
Cogn. Sci..
[9]
A. Reber.
Implicit learning of artificial grammars
,
1967
.