Grammar-based connectionist approaches to language

This article describes an approach to connectionist language research which relies on the development of grammar formalisms rather than computer models. From formulations of the fundamental theoretical commitments of connectionism and of generative grammar, it is argued that these two paradigms are mutually compatible. Integrating the basic assumptions of the paradigms results in formal theories of grammar that centrally incorporate a certain degree of connectionist computation. Two such grammar formalisms — Harmonic Grammar (Legendre, Miyata and Smolensky, 1990ab) and Optimality Theory (Prince and Smolensky, 1991, 1993) — are briefly introduced to illustrate grammar-based approaches to connectionist language research. The strengths and weaknesses of grammar-based research and more traditional model-based research are argued to be complementary, suggesting a significant role for both strategies in the spectrum of connectionist language research. This article is addressed to basic methodological issues arising in connectionist research on language. I will attempt to briefly sketch a lengthy argument begun in Smolensky, Legendre, and Miyata (1992) and presented in detail in Smolensky and Legendre (in progress). In many places, I will try to articulate personal viewpoints and, to a very limited degree, justify them. The focus is on two main claims. The first is that there are two general styles of research that both deserve a central place in connectionist approaches to language. The first, model-based research, is well-established. The second, grammar-based research, is less so. Each approach, I will argue, has important strengths that are lacking in the other. The second main claim is that the time has come to stop regarding generative grammar and connectionist approaches to language as incompatible research paradigms. Each has significant potential for contributing to the other. I will suggest a view of the core theoretical commitments of the two paradigms, connectionism and generative linguistics, and argue that these commitments combine to support a coherent and fruitful research program in connectionist-grounded generative grammar. It is my belief, although I will not attempt to justify it in detail here, that the core commitments I identify are indeed consensus beliefs of the connectionist and generative linguistics research communities. Going beyond the core commitments, individual researchers have further commitments which are often not mutually compatible, and these competing scientific hypotheses must of course be adjudicated by theoretical and empirical arguments. But at this level, competition between incompatible hypotheses is readily found among generative grammarians themselves, or among connectionists themselves, as well as between generative linguists and connectionists. Thus it seems to me more accurate to regard the current scientific debates about language as individual conflicts between individual hypotheses, rather than a war between two unified paradigms, Connectionism and Generative Grammar. 1. Commitments of connectionism The Parallel Distributed Processing (PDP) school of connectionism is founded, it seems to me, on the following general principles (Rumelhart, McClelland, and the PDP Research Group, 1986):

[1]  Stephen A. Ritz,et al.  Distinctive features, categorical perception, and probability learning: some applications of a neural model , 1977 .

[2]  S. Pinker,et al.  On language and connectionism: Analysis of a parallel distributed processing model of language acquisition , 1988, Cognition.

[3]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[4]  Michael C. Mozer,et al.  Rule Induction through Integrated Symbolic and Subsymbolic Processing , 1991, NIPS.

[5]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[6]  T. Bever,et al.  The relation between linguistic structure and associative theories of language learning—A constructive critique of some connectionist learning models , 1988, Cognition.

[7]  A. Giovagnoli Connectionist modelling in cognitive neuropsychology: A case study , 1995, The Italian Journal of Neurological Sciences.

[8]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[9]  Nick Chater,et al.  Connectionist natural language processing: the state of the art , 1999, Cogn. Sci..

[10]  Michael C. Mozer,et al.  Mathematical Perspectives on Neural Networks , 1996 .

[11]  Alan Prince,et al.  Prosodic morphology : constraint interaction and satisfaction , 1993 .

[12]  Leonard Talmy,et al.  Force Dynamics in Language and Cognition , 1987, Cogn. Sci..

[13]  C. P. Dolan Tensor manipulation networks: connectionist and symbolic approaches to comprehension, learning, and planning , 1989 .

[14]  P. Smolensky,et al.  Harmonic Grammar -- A Formal Multi-Level Connectionist Theory of Linguistic Well-Formedness: An Application ; CU-CS-464-90 , 1990 .

[15]  M. McCloskey Networks and Theories: The Place of Connectionism in Cognitive Science , 1991 .

[16]  Paul Smolensky,et al.  Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1990, Artif. Intell..

[17]  P. Smolensky On the proper treatment of connectionism , 1988, Behavioral and Brain Sciences.

[18]  Y. Miyata,et al.  Harmonic grammar: A formal multi-level connectionist theory of linguistic well-formedness: Theoretic , 1990 .

[19]  Bruce Tesar,et al.  Learning optimality-theoretic grammars☆ , 1998 .

[20]  R. Jakobson,et al.  Selected Writings: I. Phonological Studies , 1965 .

[21]  L. Shastri,et al.  From simple associations to systematic reasoning: A connectionist representation of rules, variables and dynamic bindings using temporal synchrony , 1993, Behavioral and Brain Sciences.

[22]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[23]  G. O. Stone,et al.  An analysis of the delta rule and the learning of statistical associations , 1986 .

[24]  George Lakoff,et al.  Women, Fire, and Dangerous Things , 1987 .

[25]  P. Smolensky,et al.  When is less more? Faithfulness and minimal links in wh-chains , 1998 .

[26]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[27]  S. Pinker,et al.  Connections and symbols , 1988 .

[28]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[29]  Katya Zubritskaya,et al.  Mechanism of sound change in Optimality Theory , 1997, Language Variation and Change.

[30]  Geoffrey E. Hinton Learning distributed representations of concepts. , 1989 .

[31]  Paul Smolensky Constraint Interaction in Generative Grammar II: Local Conjunction or Random Rules in Universal Gram , 1997 .

[32]  Stephen Grossberg,et al.  Absolute stability of global pattern formation and parallel memory storage by competitive neural networks , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[33]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[34]  J. Blevins The Syllable in Phonological Theory , 1995 .

[35]  P. Smolensky On the comprehension/production dilemma in child language , 1996 .

[36]  James L. McClelland,et al.  Parallel Distributed Processing: Explorations in the Microstructure of Cognition : Psychological and Biological Models , 1986 .

[37]  Michael C. Mozer,et al.  Perception of multiple objects - a connectionist approach , 1991, Neural network modeling and connectionism.

[38]  Paul Smolensky,et al.  Schema Selection and Stochastic Inference in Modular Environments , 1983, AAAI.

[39]  James L. McClelland Toward a theory of information processing in graded, random, and interactive networks , 1993 .

[40]  R. Jakobson Child Language, Aphasia and Phonological Universals , 1980 .

[41]  A Prince,et al.  Optimality: From Neural Networks to Universal Grammar , 1997, Science.

[42]  David S. Touretzky,et al.  A Computational Basis for Phonology , 1989, NIPS.

[43]  Richard Hudson,et al.  Foundations of cognitive grammar. Volume 1. Theoretical prerequisites , 1990 .

[44]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Cheryl Cydney Zoll,et al.  Parsing Below the Segment in a Constraint-Based Framework , 1998 .

[46]  Paul Smolensky,et al.  Lexical and postlexical processes in spoken word production , 1999 .

[47]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[48]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[49]  Pilar Barbosa,et al.  Is the best good enough? : optimality and competition in syntax , 1998 .

[50]  P. Smolensky The Initial State and 'Richness of the Base' in Optimality Theory , 1996 .

[51]  Teuvo Kohonen,et al.  Associative memory. A system-theoretical approach , 1977 .

[52]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[53]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Bernard Tranel,et al.  French liaison and elision revisited: A unified account within Optimality Theory , 1994 .

[55]  Geoffrey E. Hinton,et al.  A Distributed Connectionist Production System , 1988, Cogn. Sci..

[56]  James L. McClelland,et al.  A distributed, developmental model of word recognition and naming. , 1989, Psychological review.

[57]  R. Langacker Foundations of cognitive grammar , 1983 .

[58]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[59]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[60]  Jerome A. Feldman,et al.  Connectionist Models and Their Properties , 1982, Cogn. Sci..

[61]  Géraldine Legendre,et al.  Principles for an Integrated Connectionist/Symbolic Theory of Higher Cognition ; CU-CS-600-92 , 1992 .

[62]  I. Biederman,et al.  Dynamic binding in a neural network for shape recognition. , 1992, Psychological review.

[63]  Yves Chauvin,et al.  Backpropagation: the basic theory , 1995 .

[64]  Bruce Tesar,et al.  Learnability in Optimality Theory (long version) , 1996 .

[65]  Tony Plate,et al.  Holographic Reduced Representations: Convolution Algebra for Compositional Distributed Representations , 1991, IJCAI.