Abstract:Although word predictability is commonly considered an important factor in reading, sophisticated accounts of predictability in theories of reading are lacking. Computational models of reading traditionally use cloze norming as a proxy of word predictability, but what cloze norms precisely capture remains unclear. This study investigates whether large language models (LLMs) can fill this gap. Contextual predictions are implemented via a novel parallel-graded mechanism, where all predicted words at a given position are pre-activated as a function of contextual certainty, which varies dynamically as text processing unfolds. Through reading simulations with OB1-reader, a cognitive model of word recognition and eye-movement control in reading, we compare the model's fit to eye-movement data when using predictability values derived from a cloze task against those derived from LLMs (GPT-2 and LLaMA). Root Mean Square Error between simulated and human eye movements indicates that LLM predictability provides a better fit than cloze. This is the first study to use LLMs to augment a cognitive model of reading with higher-order language processing while proposing a mechanism on the interplay between word predictability and eye movements. Reading comprehension is a crucial skill that is highly predictive of later success in education. One aspect of efficient reading is our ability to predict what is coming next in the text based on the current context. Although we know predictions take place during reading, the mechanism through which contextual facilitation affects oculomotor behaviour in reading is not yet well-understood. Here, we model this mechanism and test different measures of predictability (computational vs. empirical) by simulating eye movements with a cognitive model of reading. Our results suggest that, when implemented with our novel mechanism, a computational measure of predictability provides better fits to eye movements in reading than a traditional empirical measure. With this model, we scrutinize how predictions about upcoming input affects eye movements in reading, and how computational approaches to measuring predictability may support theory testing. Modelling aspects of reading comprehension and testing them against human behaviour contributes to the effort of advancing theory building in reading research. In the longer term, more understanding of reading comprehension may help improve reading pedagogies, diagnoses and treatments.

One Size Does Not Fit All: The Case for Personalised Word Complexity Models

Difficult for Whom? A Study of Japanese Lexical Complexity

Strong Baselines for Complex Word Identification across Multiple Languages

Lexical Complexity Prediction: An Overview

CompLex: A New Corpus for Lexical Complexity Prediction from Likert Scale Data

The Structural Complexity of Chinese Words and Its Relationship with Word Frequency.

Text Complexity Classification Based on Linguistic Information: Application to Intelligent Tutoring of ESL

Cross-Lingual Transfer Learning for Complex Word Identification

A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity

Smart Word Suggestions for Writing Assistance

Controlling Text Complexity in Neural Machine Translation

A Context-Aware Approach for the Identification of Complex Words in Natural Language Texts

Using Letter Positional Probabilities to Assess Word Complexity

Larger and more instructable language models become less reliable

Construction of a text complexity grading model for English textbooks in the context of globalization

OCHADAI-KYOTO at SemEval-2021 Task 1: Enhancing Model Generalization and Robustness for Lexical Complexity Prediction

Language models outperform cloze predictability in a cognitive model of reading

ComplexityNet: Increasing LLM Inference Efficiency by Learning Task Complexity

Lexical Complexity Controlled Sentence Generation

Investigating the Contextualised Word Embedding Dimensions Responsible for Contextual and Temporal Semantic Changes

Multi-Faceted Question Complexity Estimation Targeting Topic Domain-Specificity