Language models show human-like content effects on reasoning

reasoning is a key ability for an intelligent system. Large language models achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human abstract reasoning is also imperfect, and depends on our knowledge and beliefs about the content of the reasoning problem. For example, humans reason much more reliably about logical rules that are grounded in everyday situations than arbitrary rules about abstract attributes. The training experiences of language models similarly endow them with prior expectations that reflect human knowledge and beliefs. We therefore hypothesized that language models would show human-like content effects on abstract reasoning problems. We explored this hypothesis across three logical reasoning tasks: natural language inference, judging the logical validity of syllogisms, and the Wason selection task (Wason, 1968). We find that state of the art large language models (with 7 or 70 billion parameters; Hoffmann et al., 2022) reflect many of the same patterns observed in humans across these tasks — like humans, models reason more effectively about believable situations than unrealistic or abstract ones. Our findings have implications for understanding both these cognitive effects, and the factors that contribute to language model performance.

[1]  Hinrich Schütze,et al.  Placing language in an integrated understanding system: Next steps toward human-level performance in neural language models , 2020, Proceedings of the National Academy of Sciences.

[2]  Lisa Anne Hendricks,et al.  Training Compute-Optimal Large Language Models , 2022, ArXiv.

[3]  S. Sreedharan,et al.  Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change) , 2022, ArXiv.

[4]  L. Cosmides The logic of social exchange: Has natural selection shaped how humans reason? Studies with the Wason selection task , 1989, Cognition.

[5]  S. Handley,et al.  Using forced choice to test belief bias in syllogistic reasoning , 2014, Cognition.

[6]  P. Pollard,et al.  On the conflict between logic and belief in syllogistic reasoning , 1983, Memory & cognition.

[7]  Navin Goyal,et al.  Are NLP Models really able to Solve Simple Math Word Problems? , 2021, NAACL.

[8]  Oyvind Tafjord,et al.  Transformers as Soft Reasoners over Language , 2020, IJCAI.

[9]  Jackie Chi Kit Cheung,et al.  An Analysis of Dataset Overlap on Winograd-Style Tasks , 2020, COLING.

[10]  S. Gu,et al.  Large Language Models are Zero-Shot Reasoners , 2022, ArXiv.

[11]  Luke Zettlemoyer,et al.  Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right , 2021, EMNLP.

[12]  Jesse Dodge,et al.  Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus , 2021, EMNLP.

[13]  Jochen Musch,et al.  On belief bias in syllogistic reasoning. , 2000, Psychological review.

[14]  Noah D. Goodman,et al.  STaR: Bootstrapping Reasoning With Reasoning , 2022, 2203.14465.

[15]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[16]  David Schlangen Norm Participation Grounds Language , 2022, ArXiv.

[17]  Jeff Wu,et al.  Self-critiquing models for assisting human evaluators , 2022, ArXiv.

[18]  Matthew Inglis,et al.  MATHEMATICIANS AND THE SELECTION TASK , 2004 .

[19]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[20]  Jonathan D. Cohen,et al.  The Computational and Neural Basis of Cognitive Control: Charted Territory and New Frontiers , 2014, Cogn. Sci..

[21]  B. Jones BOUNDED RATIONALITY , 1999 .

[22]  E. Heit,et al.  Assessing the belief bias effect with ROCs: it's a response bias effect. , 2010, Psychological review.

[23]  J. Duncan,et al.  Integrated Intelligence from Distributed Brain Activity , 2020, Trends in Cognitive Sciences.

[24]  Graham Neubig,et al.  How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering , 2020, Transactions of the Association for Computational Linguistics.

[25]  T. Yarkoni,et al.  The generalizability crisis , 2019, Behavioral and Brain Sciences.

[26]  D. Sperber,et al.  The Enigma of Reason , 2017 .

[27]  Samuel Ritter,et al.  Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study , 2017, ICML.

[28]  Yejin Choi,et al.  Reframing Instructional Prompts to GPTk’s Language , 2021, FINDINGS.

[29]  Noah D. Goodman,et al.  Evaluating Compositionality in Sentence Embeddings , 2018, CogSci.

[30]  Richard E. Nisbett,et al.  A longitudinal study of the effects of undergraduate training on reasoning. , 1990 .

[31]  R Devon Hjelm,et al.  Understanding by Understanding Not: Modeling Negation in Language Models , 2021, NAACL.

[32]  N. Chater,et al.  Optimal data selection: Revision, review, and reevaluation , 2003, Psychonomic bulletin & review.

[33]  Christopher D. Manning,et al.  Natural Logic for Textual Inference , 2007, ACL-PASCAL@ACL.

[34]  Gerd Gigerenzer,et al.  Heuristic decision making. , 2011, Annual review of psychology.

[35]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[36]  Jeffrey P. Bigham,et al.  Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning , 2022, ArXiv.

[37]  Jonathan Evans,et al.  Rationality and reasoning , 1996 .

[38]  Yoshua Bengio The Consciousness Prior , 2017, ArXiv.

[39]  A. Staub,et al.  Beliefs and Bayesian reasoning , 2017, Psychonomic bulletin & review.

[40]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[41]  Markus N. Rabe,et al.  LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning , 2021, ICML.

[42]  Tom B. Brown,et al.  Language Models (Mostly) Know What They Know , 2022, ArXiv.

[43]  Samuel J. Gershman,et al.  A theory of learning to infer , 2019, bioRxiv.

[44]  Jonathan S. Evans,et al.  Bias in human reasoning - causes and consequences , 1990, Essays in cognitive psychology.

[45]  James L. McClelland,et al.  A weighted constraint satisfaction approach to human goal-directed decision making , 2021, bioRxiv.

[46]  Alexander K. Luria,et al.  Towards the Problem of the Historical Nature of Psychological Processes , 1971 .

[47]  J. S. B. T. Evans,et al.  Belief bias in children's reasoning , 1995 .

[48]  DOES STUDYING LOGIC IMPROVE LOGICAL REASONING , 2016 .

[49]  Po-Sen Huang,et al.  Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.

[50]  Can language models learn from explanations in context? , 2022, ArXiv.

[51]  Falk Lieder,et al.  Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources , 2019, Behavioral and Brain Sciences.

[52]  Eric Schulz,et al.  Using cognitive psychology to understand GPT-3 , 2022, 2206.14576.

[53]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[54]  Michael Henry Tessler Understanding belief bias by measuring prior beliefs for a Bayesian model of syllogistic reasoning , 2015 .

[55]  Jonathan Evans Heuristic and analytic processes in reasoning , 1984 .

[56]  Noah D. Goodman,et al.  Logic, Probability, and Pragmatics in Syllogistic Reasoning , 2022, Top. Cogn. Sci..

[57]  Adam Wierman,et al.  Thinking Fast and Slow , 2017, SIGMETRICS Perform. Evaluation Rev..

[58]  Matt Gardner,et al.  Impact of Pretraining Term Frequencies on Few-Shot Reasoning , 2022, ArXiv.

[59]  C. Speelman,et al.  Does mathematics training lead to better logical thinking and reasoning? A cross-sectional assessment from students to professors , 2020, PloS one.

[60]  P. Johnson-Laird,et al.  REASONING AND A SENSE OF REALITY , 1972 .

[61]  Daniel Yamins,et al.  Explanatory models in neuroscience: Part 1 - taking mechanistic abstraction seriously , 2021, ArXiv.

[62]  M. Mitchell Abstraction and analogy‐making in artificial intelligence , 2021, Annals of the New York Academy of Sciences.

[63]  Daniel L. K. Yamins,et al.  Explanatory models in neuroscience: Part 2 - constraint-based intelligibility , 2021, ArXiv.

[64]  P C Wason,et al.  Reasoning about a Rule , 1968, The Quarterly journal of experimental psychology.

[65]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[66]  G. Marcus Kluge : the haphazard evolution of the human mind , 2009 .

[67]  Gary Marcus,et al.  The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence , 2020, ArXiv.

[68]  Ellie Pavlick,et al.  Do Prompt-Based Models Really Understand the Meaning of Their Prompts? , 2021, NAACL.

[69]  Jane X. Wang,et al.  Meta-learning in natural and artificial intelligence , 2020, Current Opinion in Behavioral Sciences.

[70]  Allen Newell,et al.  Physical Symbol Systems , 1980, Cogn. Sci..

[71]  P. Wason,et al.  Natural and contrived experience in a reasoning problem , 1971 .

[72]  G. Marcus The Algebraic Mind: Integrating Connectionism and Cognitive Science , 2001 .

[73]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, ArXiv.

[74]  Predictability and Surprise in Large Generative Models , 2022, ArXiv.

[75]  Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ArXiv.

[76]  Keith J Holyoak,et al.  Pragmatic reasoning schemas , 1985, Cognitive Psychology.

[77]  John Clibbens,et al.  The Role of Implicit and Explicit Negation in Conditional Reasoning Bias , 1996 .

[78]  Samuel J. Gershman,et al.  Computational rationality: A converging paradigm for intelligence in brains, minds, and machines , 2015, Science.

[79]  M Binz,et al.  Heuristics from bounded meta-learned inference. , 2020, Psychological review.

[80]  Andrew Kyle Lampinen,et al.  Symbolic Behaviour in Artificial Intelligence , 2021, ArXiv.

[81]  Jonathan Evans In two minds: dual-process accounts of reasoning , 2003, Trends in Cognitive Sciences.

[82]  Hannaneh Hajishirzi,et al.  Cross-Task Generalization via Natural Language Crowdsourcing Instructions , 2021, ACL.

[83]  Timo Schick,et al.  Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP , 2021, Transactions of the Association for Computational Linguistics.

[84]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[85]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[86]  Michael I. Swart,et al.  Embodied geometric reasoning: Dynamic gestures during intuition, insight, and proof. , 2020, Journal of Educational Psychology.

[87]  J. Dean,et al.  Emergent Abilities of Large Language Models , 2022, ArXiv.

[88]  James L. McClelland,et al.  What underlies rapid learning and systematic generalization in humans , 2021, ArXiv.

[89]  Mohammad Bavarian,et al.  Training Verifiers to Solve Math Word Problems , 2021, ArXiv.

[90]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[91]  Quoc V. Le,et al.  A Simple Method for Commonsense Reasoning , 2018, ArXiv.

[92]  James L. McClelland,et al.  On the control of automatic processes: a parallel distributed processing account of the Stroop effect. , 1990, Psychological review.

[93]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[94]  P. Johnson-Laird,et al.  Psychology of Reasoning: Structure and Content , 1972 .

[95]  F. Paas,et al.  Cognitive Architecture and Instructional Design , 1998 .

[96]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[97]  Roger Levy,et al.  On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior , 2020, CogSci.

[98]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[99]  Robert L. Goldstone,et al.  Concreteness Fading in Mathematics and Science Instruction: a Systematic Review , 2014 .