Testing for Grammatical Category Abstraction in Neural Language Models

The notion of grammatical categories is fundamental to human language. Humans abstract over individual lexical items to form grammatical categories, such as nouns and verbs in English, and this category membership (rather than lexical identity) governs the applicability of linguistic rules (e.g., nouns can be heads of subjects of a verb). Category membership of new words are rapidly inferred from their linguistic environment: if a speaker of English hears I saw a blick, it is immediately clear that blick is a noun. This knowledge about the novel word’s grammatical category enables speakers to furthermore produce sentences such as We like the blick and The blick jumped, even though these new contexts have no lexical overlap with the context that blick was first observed in. Hence, the identification of a grammatical category allows application of rules that operate over that category, allowing for generalization outside of the context that the novel word has been observed in (Gómez and Gerken, 2000). Can we find evidence of abstract grammatical categories and category-based generalization resembling humans in pretrained neural language models? From the perspective of Cognitive Science, category abstraction in pretrained neural models can provide an argument against the need for an innate bias towards categorization (and prespecification of the set of lexical categories) for learners of language. From the perspective of Natural Language Processing, it is known that contemporary neural models perform well (near 98% accuracy) on benchmarks for part-of-speech (POS) tagging (Bohnet et al., 2018; He and Choi, 2019), and that diagnostic classifiers for probing pretrained models also achieve similarly high performance on POS (Tenney et al., 2019). However, it still remains an open question whether pretrained models can perform category-based generalization using novel words learned from limited contexts, and without being explicitly trained to perform categorization. This is also in line with the problem of out-of-distribution generalization in neural models of language and efforts to develop benchmarks for linguistic generalization that humans are capable of (Kim and Linzen, 2020; Linzen, 2020, i.a.). To this end, we propose a new method inspired by human developmental studies to probe pretrained neural language models, and present experimental results on BERT-large (Devlin et al., 2019). Our method does not require training a separate classifier on top, which lets us bypass the methodological questions raised in the recent literature on the validity of using diagnostic classifiers as probes (Hewitt and Liang, 2019; Voita and Titov, 2020, i.a.).

[1]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[2]  Tal Linzen,et al.  How Can We Accelerate Progress Towards Human-like Linguistic Generalization? , 2020, ACL.

[3]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[4]  Han He,et al.  Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT , 2019, FLAIRS.

[5]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[6]  R. Gómez,et al.  Infant artificial language learning and language acquisition , 2000, Trends in Cognitive Sciences.

[7]  P. Jusczyk,et al.  Infants′ Detection of the Sound Patterns of Words in Fluent Speech , 1995, Cognitive Psychology.

[8]  J. Weissenborn,et al.  Functional Elements in Infants' Speech Processing: The Role of Determiners in the Syntactic Categorization of Lexical Elements , 2004 .

[9]  Ivan Titov,et al.  Information-Theoretic Probing with Minimum Description Length , 2020, EMNLP.

[10]  Gonçalo Simões,et al.  Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings , 2018, ACL.

[11]  Tal Linzen,et al.  COGS: A Compositional Generalization Challenge Based on Semantic Interpretation , 2020, EMNLP.

[12]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[13]  John Hewitt,et al.  Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.

[14]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[15]  P. Jusczyk,et al.  The head-turn preference procedure for testing auditory perception , 1995 .