Abstract:Large Language Models (LLMs) have been shown to be effective models of the human language system, with some models predicting most explainable variance of brain activity in current datasets. Even in untrained models, the representations induced by architectural priors can exhibit reasonable alignment to brain data. In this work, we investigate the key architectural components driving the surprising alignment of untrained models. To estimate LLM-to-brain similarity, we first select language-selective units within an LLM, similar to how neuroscientists identify the language network in the human brain. We then benchmark the brain alignment of these LLM units across five different brain recording datasets. By isolating critical components of the Transformer architecture, we identify tokenization strategy and multihead attention as the two major components driving brain alignment. A simple form of recurrence further improves alignment. We further demonstrate this quantitative brain alignment of our model by reproducing landmark studies in the language neuroscience field, showing that localized model units -- just like language voxels measured empirically in the human brain -- discriminate more reliably between lexical than syntactic differences, and exhibit similar response profiles under the same experimental conditions. Finally, we demonstrate the utility of our model's representations for language modeling, achieving improved sample and parameter efficiency over comparable architectures. Our model's estimates of surprisal sets a new state-of-the-art in the behavioral alignment to human reading times. Taken together, we propose a highly brain- and behaviorally-aligned model that conceptualizes the human language system as an untrained shallow feature encoder, with structural priors, combined with a trained decoder to achieve efficient and performant language processing.

What Makes Two Language Models Think Alike?

Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures

Towards Measuring Representational Similarity of Large Language Models

Structural Similarities Between Language Models and Neural Response Measurements

On the Impact of Language Selection for Training and Evaluating Programming Language Models

Bridging the Semantic Latent Space Between Brain and Machine: Similarity is All You Need

Perturbed examples reveal invariances shared by language models

The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling

Exploring Linguistic Properties of Monolingual BERTs with Typological Classification among Languages

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network

Language Model Evaluation Beyond Perplexity

Do Large Language Models Mirror Cognitive Language Processing?

Divergences between Language Models and Human Brains

Do Vision and Language Models Share Concepts? A Vector Space Alignment Study

Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models

Do Neural Language Models Show Preferences for Syntactic Formalisms?

Are All Languages Equally Hard to Language-Model?

Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability

Brains and algorithms partially converge in natural language processing