Running Head : Letting Structure Emerge Letting Structure Emerge : Connectionist and Dynamical Systems Approaches to Understanding Cognition

Connectionist and dynamical systems approaches view human thought, language and behavior as arising, in many cases, as the emergent consequences of simple subor non-cognitive processes. We view the entities that serve as the basis for symbolic approaches, including structured statistical approaches, as potentially misleading, illusory constructs that may, in many instances, have no real basis in the processes that give rise to linguistic and cognitive abilities or the development of these abilities. While probabilistic approaches can be useful in determining what would be optimal under certain assumptions, we suggest that approaches like the connectionist and dynamical systems approaches, which take seriously the processes giving rise to cognition, will be essential in achieving a full understanding of cognition and development. Letting Structure Emerge 3 Emergence of Structure in Cognition Human thoughts and human utterances have rich and complex structure. In our view, this richness is the emergent consequence of the interplay of simpler, subor noncognitive processes; the propositional and symbolic structures commonly proposed in some cognitive theories are approximate descriptive characterizations that have no actual status in the minds of cognizing and communicating people. Emergence is ubiquitous in nature: Consider the complex structure of an ant hill. It can have an elaborate structure, with a complex network of passageways leading from deep underground to 25 feet into the sky. One might suppose that ants possess a blueprint for creating such structures, but something far simpler is in play [1]. Ants are sensitive to certain gasses within their nests; when these gasses build up they move grains of dirt to the outside. This activity lets the gasses escape and has the byproduct of creating the elaborate structure of the nest. Likewise, we argue, the structures we see in human thought and language may often arise from simple processes. The present paper contrasts the emergent structure view with the “top down” approach advocated in the companion article [2]. In that approach, cognizing agent are treated as optimal inferencing machines, coming to cognitive tasks with a space of hypotheses and a prior probability distribution. Observations provide a means of evaluating the hypotheses and selecting the hypothesis that has the highest posterior probability. Work within the structured probabilistic framework is often thought to address an abstract level of analysis akin to Marr’s computational level [3]. That is, proponents do not claim that the processes that take place in the cognizing agent actually correspond to the processes their models use to determine the posterior probability Letting Structure Emerge 4 distribution. Within the top-down approach, one considers process and mechanism at a later stage of scientific investigation. We argue that there is a danger in framing of human cognition as a process of hypotheses selection. If the behaving child or adult is not engaged in the formulation of hypotheses and selection among them, then focusing on these constructs as computational level descriptions would be misleading. It may lead to an enterprise, much like Chomsky’s universal grammar approach to language, in which researchers focus on searching for entities that may not exist, ignoring those factors that actually shape behavior (See Box 1). A full explanation of behavior cannot ignore the processes that support it. Subtle changes in task situations can drastically change whether a process of hypothesis formation and selection is an appropriate characterization. For example, a recent study [4] has found that people can exploit a causal framing scenario to make normatively correct, explicit inferences in a contingency learning task if they are given ample time to make explicit predictions. However, when the same contingencies govern events to which participants must respond very quickly, they appear to learn according to a process akin to simple connection weight adjustment. It appears that different mechanisms underlie learning of the very same probabilistic contingencies in the explicit prediction vs quick response variants of the task, yet the statistical structure of the two tasks, and thus the computational level analysis of what would be optimal in the two situations, is the same. Emergence approaches and structured probabilistic approaches share an emphasis on statistical regularities in the learning environment and on variability in human Letting Structure Emerge 5 performance. Models on the emergentist side often learn to optimize their probabilistic behavior, e.g., by coming to closely match, in their outputs, the probabilistic structure of the experiences on which they are trained [5, 6). Thus, the relevance of probability in characterizing human behavior is not in dispute. What is in dispute is the utility of treating cognition as if its goal and outcome is the selection of one or the other structured statistical model, whether it be a probabilistic grammar, a mutation hierarchy, or a specific causal Bayes network [7, 8, 9]. From our perspective, the hypotheses, hypothesis spaces and data structures of the probabilistic inference approach are at best approximations of what humans learn. They are not the building-blocks of an explanatory theory. Instead, they are sometimes helpful, but often misleading, approximate characterizations of the emergent consequences of the real underlying processes. Likewise the units over which these hypotheses are predicated -concepts, words, morphemes, syllables, and phonemes – are also best seen themselves as sometimes useful but sometimes misleading approximate characterizations (See Box 2). The remaining sections consider two very different cognitive domains that have been modeled as emergent phenomena. In both cases, we argue that it is unnecessary, and may even lead research astray, to approach the situations in which the target phenomena occur as ones involving structured probabilistic inference. In Box 3 we list other examples of linguistic, developmental, and cognitive domains that have been captured using similar approaches. The A not B error: Absence of a Hypothesis or Emergent Consequence of the Dynamics of Motor Behavior? Letting Structure Emerge 6 The A not-B task was introduced by Piaget [10] to measure the development of the object concept – as he framed it, the belief that objects exist independent of one’s own actions. Such a belief provides what may seem at first to be a good description of events in the canonical form of the task, which is shown in Figure 1: After searching for an object at one location, then seeing it hidden at a new location, 8-,9and 10month old infants reach back to that first location; older infants reach correctly to the new location. Within the object-concept framework, the phenomenon reflects the absence of, or perhaps a low prior probability for, the hypothesis that the object exists independently of the child's actions; the younger child, lacking such a hypothesis, reaches to the place where his actions previously led him to find the object. An alternative to this object-concept based account has been developed within Dynamic Field Theory [11, 12]. This account explains the error through general processes of goal directed reaching (and indeed is a variant of one model of adult reaching behavior). The model (Figure 1), shows the dynamic field which represents the activation within a population of neuron-like units that dynamically represent the direction of a reach. The field integrates multiple sources of relevant information – the immediate events (e.g., hiding the toy), the lids or covers on the table, and past reaches. Because the internal activations that lead to a directional reach are themselves dynamic events –with rise times, decay rates, amplitudes, and varying spatial resolution – the model predicts –and experiments have confirmed -fine-grained stimulus, timing, and task effects [11, 12]. Because the explanation derives from general models of goaldirected action that are not specific to this task nor to this developmental period, the model makes predictions (tested and confirmed) about similar phenomena (and Letting Structure Emerge 7 perseverations) at ages younger than and considerably older than the 8 to 12 months of A not B errors in the standard task [13, 14]. Indeed, using this model as a guide, experimenters can make the error come and go predicatably. This is achieved by changing the delay, by heightening the attention-grabbing properties of the covers or the hiding event, and by increasing and decreasing the number of prior reaches to A [11, 12, 14, 15]. The DFT-based model accounts for a wide range of findings showing that variables unrelated to beliefs about the existence of objects can affect the A-not-B error. The model has also been used to predict (correctly) that a reach back to A will occur in some situations when there is no toy hidden [15]. Furthermore, because the dynamic field is viewed as a motor planning field –and thus tied to what is known about the bodycentric nature of the neural bases of motor plans [16], the model also makes the novel prediction that perseverative errors should disappear if the needed motor plan for reaching to B is distinctly different from that for reaching to A. One experiment achieved this by shifting the posture of the infant [15, 17; Figure 1]. Because no object is needed and because of the importance of the infant’s posture, explanations based on beliefs about objects seem largely irrelevant to understanding Anot-B behavior. What is developing is a complex dynamic system, and it is this system that makes intelligent behavior, not the concepts, hypotheses, or inferences that some ascribe to the child’s thinking. Letting Structure Emerge 8 Connectionist vs. Structured Probabilistic Approaches to Semantic Cognition The A-not-B task has not been an explicit fo

[1]  J. Piaget The construction of reality in the child , 1954 .

[2]  F. Keil Constraints on knowledge and cognitive development. , 1981 .

[3]  A. Tversky,et al.  The framing of decisions and the psychology of choice. , 1981, Science.

[4]  Joan L. Bybee Morphology: A study of the relation between meaning and form , 1985 .

[5]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[6]  E. Butterfield,et al.  Are children's rule-assessment classifications invariant across instances of problem types? , 1986, Child development.

[7]  E. Markman,et al.  Categories and induction in young children , 1986, Cognition.

[8]  Geoffrey E. Hinton,et al.  Schemata and Sequential Thought Processes in PDP Models , 1986 .

[9]  James L. McClelland Parallel Distributed Processing: Implications for Cognition and Development , 1988 .

[10]  James L. McClelland,et al.  Learning and Applying Contextual Constraints in Sentence Comprehension , 1990, Artif. Intell..

[11]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[12]  Stephen Jay Gould,et al.  The Panda's Thumb: More Reflections in Natural History , 1990 .

[13]  Rochel Gelman,et al.  First Principles Organize Attention to and Learning About Relevant Data: Number and the Animate-Inanimate Distinction as Examples , 1990, Cogn. Sci..

[14]  M. W. Montgomery,et al.  The Quantitative Description of Action Disorganisation after Brain Damage: A Case Study , 1991 .

[15]  James L. McClelland,et al.  A computational model of semantic memory impairment: modality specificity and emergent category specificity. , 1991, Journal of experimental psychology. General.

[16]  Mark F. St. John,et al.  The Story Gestalt: A Model of Knowledge-Intensive Processes in Text Comprehension , 1992, Cogn. Sci..

[17]  Javier R. Movellan,et al.  Learning Continuous Probability Distributions with Symmetric Diffusion Networks , 1993, Cogn. Sci..

[18]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[19]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[20]  Joel L. Davis,et al.  An Introduction to Neural and Electronic Networks , 1995 .

[21]  D. Lewkowicz,et al.  A dynamic systems approach to the development of cognition and action. , 2007, Journal of cognitive neuroscience.

[22]  James L. McClelland,et al.  Understanding normal and impaired word reading: computational principles in quasi-regular domains. , 1996, Psychological review.

[23]  P vanGeert A dynamic systems approach to the development of cognition and action - Thelen,E, Smith,LB , 1996 .

[24]  Douglas L. T. Rohde,et al.  Language acquisition in the absence of explicit negative evidence: how important is starting small? , 1999, Cognition.

[25]  Linda B. Smith,et al.  Knowing in the context of acting: the task dynamics of the A-not-B error. , 1999, Psychological review.

[26]  Esther Thelen,et al.  Motor memory is a factor in infant perseverative errors , 2000 .

[27]  Jeffrey S. Perry,et al.  Edge co-occurrence in natural images predicts contour grouping performance , 2001, Vision Research.

[28]  Steven Johnson,et al.  Emergence: The Connected Lives of Ants, Brains, Cities, and Software , 2001 .

[29]  J. Hay Lexical frequency in morphology: Is everything relative? , 2001 .

[30]  Han L. J. van der Maas,et al.  Evidence for the Phase Transition from Rule I to Rule II on the Balance Scale Task , 2001 .

[31]  Linda B. Smith,et al.  Tests of a dynamic systems account of the A-not-B error: the influence of prior experience on the spatial memory abilities of two-year-olds. , 2001, Child development.

[32]  E. Thelen,et al.  The dynamics of embodiment: A field theory of infant perseverative reaching , 2001, Behavioral and Brain Sciences.

[33]  David C. Plaut,et al.  A connectionist model of sentence comprehension and production , 2002 .

[34]  James L. McClelland,et al.  Structure and deterioration of semantic memory: a neuropsychological and computational investigation. , 2004, Psychological review.

[35]  Alison Gopnik,et al.  Children's causal inferences from indirect evidence: Backwards blocking and Bayesian reasoning in preschoolers , 2004, Cogn. Sci..

[36]  James L. McClelland,et al.  Semantic Cognition: A Parallel Distributed Processing Approach , 2004 .

[37]  D. Plaut,et al.  Doing without schema hierarchies: a recurrent connectionist approach to normal and impaired routine sequential action. , 2004, Psychological review.

[38]  P. Smolensky,et al.  Optimality Theory: Constraint Interaction in Generative Grammar , 2004 .

[39]  D. Plaut,et al.  The processing of root morphemes in Hebrew: Contrasting localist and distributed accounts , 2005 .

[40]  J. Elman Distributed representations, simple recurrent networks, and grammatical structure , 1991, Machine Learning.

[41]  James L. McClelland,et al.  Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition , 2005 .

[42]  Linda B. Smith,et al.  From the lexicon to expectations about kinds: a role for associative learning. , 2005, Psychological review.

[43]  James L. McClelland,et al.  Graded State Machines: The Representation of Temporal Contingencies in Simple Recurrent Networks , 2005, Machine Learning.

[44]  Matthew M Botvinick,et al.  Short-term memory for serial order: a recurrent neural network model. , 2006, Psychological review.

[45]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[46]  Jonathan D. Cohen,et al.  The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. , 2006, Psychological review.

[47]  Linda B Smith,et al.  Young infants reach correctly in A-not-B tasks: on the development of stability and perseveration. , 2006, Infant behavior & development.

[48]  J. Tenenbaum,et al.  Poverty of the Stimulus? A Rational Approach , 2006 .

[49]  J. Tenenbaum,et al.  Word learning as Bayesian inference. , 2007, Psychological review.

[50]  B. Hopkins,et al.  Postural change effects on infants' AB task performance: visual, postural, or spatial? , 2007, Journal of experimental child psychology.

[51]  Mark S. Seidenberg,et al.  Graded semantic and phonological similarity effects in priming: evidence for a distributed connectionist approach to morphology. , 2007, Journal of experimental psychology. General.

[52]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[53]  Harald Maurer,et al.  Paul Smolensky, Géraldine Legendre: The Harmonic Mind. From Neural Computation to Optimality-Theoretic Grammar. Vol. 1: Cognitive Architecture. Vol. 2: Linguistic and Philosophical Implications , 2009 .

[54]  T. Griffiths Probabilistic models of cognition 1 Running head : PROBABILISTIC MODELS OF COGNITION Probabilistic models of cognition : Exploring the laws of thought , 2009 .

[55]  Joe Pater The harmonic mind : from neural computation to optimality-theoretic grammar , 2009 .

[56]  James L. McClelland,et al.  A connectionist model of a continuous developmental transition in the balance scale task , 2009, Cognition.

[57]  Linda B. Smith,et al.  Cue salience and infant perseverative reaching: tests of the dynamic field theory. , 2009, Developmental science.

[58]  James L. McClelland,et al.  Semantic Cognition : Its Nature , Its Development , and Its Neural Basis , 2008 .

[59]  Thomas L. Griffiths,et al.  Learning phonetic categories by learning a lexicon , 2009 .

[60]  James L. McClelland,et al.  When Should We Expect Indirect Effects in Human Contingency Learning , 2009 .

[61]  J. Tenenbaum,et al.  Structured statistical models of inductive reasoning. , 2009, Psychological review.