Compositional Processing Emerges in Neural Networks Solving Math Problems

A longstanding question in cognitive science concerns the learning mechanisms underlying compositionality in human cognition. Humans can infer the structured relationships (e.g., grammatical rules) implicit in their sensory observations (e.g., auditory speech), and use this knowledge to guide the composition of simpler meanings into complex wholes. Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations. We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings (e.g., the quantities corresponding to numerals) should be composed according to structured rules (e.g., order of operations). Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.

[1]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[2]  Andrea E. Martin,et al.  Modelling meaning composition from formalism to mechanism , 2019, Philosophical Transactions of the Royal Society B.

[3]  Pushmeet Kohli,et al.  Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.

[4]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5]  Jianfeng Gao,et al.  Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving , 2019, ArXiv.

[6]  S. Pinker,et al.  On language and connectionism: Analysis of a parallel distributed processing model of language acquisition , 1988, Cognition.

[7]  Hinrich Schütze,et al.  Placing language in an integrated understanding system: Next steps toward human-level performance in neural language models , 2020, Proceedings of the National Academy of Sciences.

[8]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[9]  Markus Werning,et al.  The Oxford Handbook of Compositionality , 2012 .

[10]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[11]  Michael Johnson Compositionality , 2020, The Wiley Blackwell Companion to Semantics.

[12]  Marco Baroni,et al.  Syntactic Structure from Deep Learning , 2020, Annual Review of Linguistics.

[13]  Omer Levy,et al.  Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.

[14]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[15]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[16]  Zenon W. Pylyshyn,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[17]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[18]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.