Abstract:Can human reading comprehension be assessed from eye movements in reading? In this work, we address this longstanding question using large-scale eyetracking data over textual materials that are geared towards behavioral analyses of reading comprehension. We focus on a fine-grained and largely unaddressed task of predicting reading comprehension from eye movements at the level of a single question over a passage. We tackle this task using three new multimodal language models, as well as a battery of prior models from the literature. We evaluate the models' ability to generalize to new textual items, new participants, and the combination of both, in two different reading regimes, ordinary reading and information seeking. The evaluations suggest that although the task is highly challenging, eye movements contain useful signals for fine-grained prediction of reading comprehension. Code and data will be made publicly available.

What problem does this paper attempt to address?

The core issue this paper attempts to address is whether it is possible to assess a reader's reading comprehension ability through their eye movement data during the reading process. Specifically, the researchers are interested in whether it is possible to predict a participant's understanding of a specific question from their eye movement data on a single paragraph. This task has been less explored in previous research, making it highly challenging. ### Background and Objectives Reading comprehension is an indispensable skill in modern society, and educational institutions and commercial companies have invested substantial resources in developing reading comprehension assessment tools. Currently, the most practical assessment method is to measure through behavioral tasks (such as reading comprehension questions). However, this method is time-consuming and costly, limiting the number and openness of reading comprehension tests. Moreover, traditional methods rely solely on the final answers and cannot track the reader's real-time comprehension process during reading. An alternative method proposed in the fields of psychology and psycholinguistics is to decode reading comprehension in real-time through eye movement data. This view is based on extensive literature indicating a close relationship between eye movements and real-time language comprehension. In recent years, with the development of modern machine learning and natural language processing, some studies have attempted to use eye movement data to predict reading comprehension. Although some progress has been made, predictive modeling is still in its early stages. ### Contributions To advance research in this field, this paper makes the following contributions: 1. **Task Definition**: Introduces a new, challenging task—predicting a participant's understanding of a specific question from their eye movement data on a single paragraph. 2. **Model Development**: Proposes three new multimodal language models (RoBERTa-QEye, MAG-QEye, PostFusion-QEye) that combine text and eye movement data. 3. **Reading Scenarios**: Studies not only general reading but also the common yet less researched scenario of information retrieval. 4. **Evaluation**: Conducts systematic evaluations of the models, including generalization capabilities at different levels such as new participants, new text items, and combinations of both. ### Data and Methods The study uses the OneStop dataset, the largest eye movement dataset of native English speakers, containing 19,440 trials from 360 participants. Each participant reads a paragraph from one of 30 articles and answers a multiple-choice question. Participants are divided into two groups: one tested under general reading conditions and the other under information retrieval conditions. ### Model Architecture 1. **RoBERTa-QEye**: Incorporates eye movement data as an additional input sequence into the RoBERTa model, projecting eye movement features into the word embedding space through a fully connected layer. 2. **MAG-QEye**: Based on the MAG architecture, adjusts the hidden word representations of the transformer encoder using eye movement information. 3. **PostFusion-QEye**: Processes text and eye movement data separately and combines them through a cross-attention mechanism. ### Baseline Models The study also compares several baseline models, including logistic regression, CNN, BEyeLSTM, and Eyettention, which are primarily used for reading comprehension prediction or can be adapted for binary classification tasks. ### Experimental Setup The study evaluates model performance in three different generalization scenarios: 1. **New Participants**: No eye movement data for the given participants, but eye movement data for other participants on the same text items. 2. **New Text Items**: No eye movement data for the given text items, but eye movement data for participants on other text items. 3. **New Participants and New Text Items**: No eye movement data for the given participants or the given text items. ### Conclusion The results indicate that while eye movement data can provide better predictive performance than text-only baselines in some cases, the improvement is generally small. This may be due to the limitations of current modeling methods, data limitations, or the insufficiently close relationship between eye movement behavior and fine-grained reading comprehension processes. The study provides infrastructure for future advancements on this issue.

Fine-Grained Prediction of Reading Comprehension from Eye Movements

Decoding Reading Goals from Eye Movements

Bridging Information-Seeking Human Gaze and Machine Reading Comprehension

Prediction in Challenging Situations: Most Bilinguals Can Predict Upcoming Semantically-Related Words in Their L1 Source Language when Interpreting

Gaze-Based Annotations For Reading Comprehension

Machine-Learned Computational Models Can Enhance the Study of Text and Discourse: A Case Study Using Eye Tracking to Model Reading Comprehension

Assessing Language Proficiency from Eye Movements in Reading

Compensatory effects of individual differences, language proficiency, and reading behavior: an eye-tracking study of second language reading assessment

Eyettention: An Attention-based Dual-Sequence Model for Predicting Human Scanpaths during Reading

Human Behavior Inspired Machine Reading Comprehension

From Word Embedding to Reading Embedding Using Large Language Model, EEG and Eye-tracking

Towards a Better Understanding Human Reading Comprehension with Brain Signals

On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior

Integrating LLM, EEG, and Eye-Tracking Biomarker Analysis for Word-Level Neural State Classification in Semantic Inference Reading Comprehension

Towards a Better Understanding of Human Reading Comprehension with Brain Signals

Integrating Large Language Model, EEG, and Eye-Tracking for Word-Level Neural State Classification in Reading Comprehension

EMTeC: A Corpus of Eye Movements on Machine-Generated Texts

Language Experience Predicts Eye Movements During Online Auditory Comprehension

Language models outperform cloze predictability in a cognitive model of reading

A Study on the Extraction and Analysis of a Large Set of Eye Movement Features during Reading

ChatGPT-BCI: Word-Level Neural State Classification Using GPT, EEG, and Eye-Tracking Biomarkers in Semantic Inference Reading Comprehension