Abstract:Although word predictability is commonly considered an important factor in reading, sophisticated accounts of predictability in theories of reading are lacking. Computational models of reading traditionally use cloze norming as a proxy of word predictability, but what cloze norms precisely capture remains unclear. This study investigates whether large language models (LLMs) can fill this gap. Contextual predictions are implemented via a novel parallel-graded mechanism, where all predicted words at a given position are pre-activated as a function of contextual certainty, which varies dynamically as text processing unfolds. Through reading simulations with OB1-reader, a cognitive model of word recognition and eye-movement control in reading, we compare the model's fit to eye-movement data when using predictability values derived from a cloze task against those derived from LLMs (GPT-2 and LLaMA). Root Mean Square Error between simulated and human eye movements indicates that LLM predictability provides a better fit than cloze. This is the first study to use LLMs to augment a cognitive model of reading with higher-order language processing while proposing a mechanism on the interplay between word predictability and eye movements. Reading comprehension is a crucial skill that is highly predictive of later success in education. One aspect of efficient reading is our ability to predict what is coming next in the text based on the current context. Although we know predictions take place during reading, the mechanism through which contextual facilitation affects oculomotor behaviour in reading is not yet well-understood. Here, we model this mechanism and test different measures of predictability (computational vs. empirical) by simulating eye movements with a cognitive model of reading. Our results suggest that, when implemented with our novel mechanism, a computational measure of predictability provides better fits to eye movements in reading than a traditional empirical measure. With this model, we scrutinize how predictions about upcoming input affects eye movements in reading, and how computational approaches to measuring predictability may support theory testing. Modelling aspects of reading comprehension and testing them against human behaviour contributes to the effort of advancing theory building in reading research. In the longer term, more understanding of reading comprehension may help improve reading pedagogies, diagnoses and treatments.

Predicting the next sentence (not word) in large language models: What model-brain alignment tells us about discourse comprehension

Rule-Based and Word-Level Statistics-Based Processing of Language: Insights from Neuroscience

Neural Substrate Underlying the Learning of a Passage with Unfamiliar Vocabulary and Syntax.

Language models and brains align due to more than next-word prediction and word-level information

Elife Assessment: Finding Structure During Incremental Speech Comprehension

On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior

A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Hum4n L4ngu4ge and the W0rld behind W0rds?

Meta predictive learning model of languages in neural circuits

A hierarchy of linguistic predictions during natural language comprehension

Integrating Large Language Model, EEG, and Eye-Tracking for Word-Level Neural State Classification in Reading Comprehension

Do Large Language Models Mirror Cognitive Language Processing?

Deep language algorithms predict semantic comprehension from brain activity

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores

Integrating LLM, EEG, and Eye-Tracking Biomarker Analysis for Word-Level Neural State Classification in Semantic Inference Reading Comprehension

On the influence of discourse connectives on the predictions of humans and language models

Instruction-tuned large language models misalign with natural language comprehension in humans

Quasi-compositional mapping from form to meaning: a neural network-based approach to capturing neural responses during human language comprehension

Language models outperform cloze predictability in a cognitive model of reading

Converging to a Lingua Franca: Evolution of Linguistic Regions and Semantics Alignment in Multilingual Large Language Models