Abstract:A sentence is more than the sum of its words: its meaning depends on how they combine with one another. The brain mechanisms underlying such semantic composition remain poorly understood. To shed light on the neural vector code underlying semantic composition, we introduce two hypotheses: (1) the intrinsic dimensionality of the space of neural representations should increase as a sentence unfolds, paralleling the growing complexity of its semantic representation; and (2) this progressive integration should be reflected in ramping and sentence-final signals. To test these predictions, we designed a dataset of closely matched normal and jabberwocky sentences (composed of meaningless pseudo words) and displayed them to deep language models and to 11 human participants (5 men and 6 women) monitored with simultaneous MEG and intracranial EEG. In both deep language models and electrophysiological data, we found that representational dimensionality was higher for meaningful sentences than jabberwocky. Furthermore, multivariate decoding of normal versus jabberwocky confirmed three dynamic patterns: (1) a phasic pattern following each word, peaking in temporal and parietal areas; (2) a ramping pattern, characteristic of bilateral inferior and middle frontal gyri; and (3) a sentence-final pattern in left superior frontal gyrus and right orbitofrontal cortex. These results provide a first glimpse into the neural geometry of semantic integration and constrain the search for a neural code of linguistic composition.SIGNIFICANCE STATEMENT Starting from general linguistic concepts, we make two sets of predictions in neural signals evoked by reading multiword sentences. First, the intrinsic dimensionality of the representation should grow with additional meaningful words. Second, the neural dynamics should exhibit signatures of encoding, maintaining, and resolving semantic composition. We successfully validated these hypotheses in deep neural language models, artificial neural networks trained on text and performing very well on many natural language processing tasks. Then, using a unique combination of MEG and intracranial electrodes, we recorded high-resolution brain data from human participants while they read a controlled set of sentences. Time-resolved dimensionality analysis showed increasing dimensionality with meaning, and multivariate decoding allowed us to isolate the three dynamical patterns we had hypothesized.

Deep language algorithms predict semantic comprehension from brain activity

Rule-Based and Word-Level Statistics-Based Processing of Language: Insights from Neuroscience

Brains and algorithms partially converge in natural language processing

MindGPT: Interpreting What You See with Non-invasive Brain Recordings

Explaining neural activity in human listeners with deep learning via natural language processing of narrative text

Brain2Word: Decoding Brain Activity for Language Generation

A hierarchy of linguistic predictions during natural language comprehension

Information-Restricted Neural Language Models Reveal Different Brain Regions' Sensitivity to Semantics, Syntax and Context

Decoding Brain Activity Associated with Literal and Metaphoric Sentence Comprehension Using Distributional Semantic Models

An information-theoretic model of shallow and deep language comprehension

Integrating LLM, EEG, and Eye-Tracking Biomarker Analysis for Word-Level Neural State Classification in Semantic Inference Reading Comprehension

Dimensionality and Ramping: Signatures of Sentence Integration in the Dynamics of Brains and Deep Language Models

Shared computational principles for language processing in humans and deep language models

Predicting the next sentence (not word) in large language models: What model-brain alignment tells us about discourse comprehension

Quasi-compositional mapping from form to meaning: a neural network-based approach to capturing neural responses during human language comprehension

Integrating Large Language Model, EEG, and Eye-Tracking for Word-Level Neural State Classification in Reading Comprehension

Decoding Linguistic Representations of Human Brain

On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior

Using cognitive psychology to understand GPT-3

Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Sentence Embeddings

Large language models predict human sensory judgments across six modalities