Abstract:What mechanisms underlie linguistic generalization in large language models (LLMs)? This question has attracted considerable attention, with most studies analyzing the extent to which the language skills of LLMs resemble rules. As of yet, it is not known whether linguistic generalization in LLMs could equally well be explained as the result of analogical processes, which can be formalized as similarity operations on stored exemplars. A key shortcoming of prior research is its focus on linguistic phenomena with a high degree of regularity, for which rule-based and analogical approaches make the same predictions. Here, we instead examine derivational morphology, specifically English adjective nominalization, which displays notable variability. We introduce a new method for investigating linguistic generalization in LLMs: focusing on GPT-J, we fit cognitive models that instantiate rule-based and analogical learning to the LLM training data and compare their predictions on a set of nonce adjectives with those of the LLM, allowing us to draw direct conclusions regarding underlying mechanisms. As expected, rule-based and analogical models explain the predictions of GPT-J equally well for adjectives with regular nominalization patterns. However, for adjectives with variable nominalization patterns, the analogical model provides a much better match. Furthermore, GPT-J's behavior is sensitive to the individual word frequencies, even for regular forms, a behavior that is consistent with an analogical account of regular forms but not a rule-based one. These findings refute the hypothesis that GPT-J's linguistic generalization on adjective nominalization involves rules, suggesting similarity operations on stored exemplars as the underlying mechanism. Overall, our study suggests that analogical processes play a bigger role in the linguistic generalization of LLMs than previously thought.

Indicatements that character language models learn English morpho-syntactic units and regularities

Adjective Scale Probe: Can Language Models Encode Formal Semantics Information?

What do character-level models learn about morphology? The case of dependency parsing

Character Eyes: Seeing Language through Character-Level Taggers

Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction

Better Character Language Modeling Through Morphology

Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens

Modelling constructivist language acquisition through syntactico-semantic pattern finding

Morphological Inflection with Phonological Features

Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models

Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs

Finding Structure in Language Models

Derivational Morphology Reveals Analogical Generalization in Large Language Models

Evaluating Morphological Compositional Generalization in Large Language Models

Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs

Why do language models perform worse for morphologically complex languages?

Emergent Representations of Program Semantics in Language Models Trained on Programs

Extracting linguistic speech patterns of Japanese fictional characters using subword units

Predicting semi-regular patterns in morphologically complex words

Dissociating language and thought in large language models: a cognitive perspective

Evaluating Language Model Character Traits