Fine-Grained Prediction of Reading Comprehension from Eye Movements

Omer Shubi,Yoav Meiri,Cfir Avraham Hadar,Yevgeni Berzak
2024-10-06
Abstract:Can human reading comprehension be assessed from eye movements in reading? In this work, we address this longstanding question using large-scale eyetracking data over textual materials that are geared towards behavioral analyses of reading comprehension. We focus on a fine-grained and largely unaddressed task of predicting reading comprehension from eye movements at the level of a single question over a passage. We tackle this task using three new multimodal language models, as well as a battery of prior models from the literature. We evaluate the models' ability to generalize to new textual items, new participants, and the combination of both, in two different reading regimes, ordinary reading and information seeking. The evaluations suggest that although the task is highly challenging, eye movements contain useful signals for fine-grained prediction of reading comprehension. Code and data will be made publicly available.
Computation and Language
What problem does this paper attempt to address?
The core issue this paper attempts to address is whether it is possible to assess a reader's reading comprehension ability through their eye movement data during the reading process. Specifically, the researchers are interested in whether it is possible to predict a participant's understanding of a specific question from their eye movement data on a single paragraph. This task has been less explored in previous research, making it highly challenging. ### Background and Objectives Reading comprehension is an indispensable skill in modern society, and educational institutions and commercial companies have invested substantial resources in developing reading comprehension assessment tools. Currently, the most practical assessment method is to measure through behavioral tasks (such as reading comprehension questions). However, this method is time-consuming and costly, limiting the number and openness of reading comprehension tests. Moreover, traditional methods rely solely on the final answers and cannot track the reader's real-time comprehension process during reading. An alternative method proposed in the fields of psychology and psycholinguistics is to decode reading comprehension in real-time through eye movement data. This view is based on extensive literature indicating a close relationship between eye movements and real-time language comprehension. In recent years, with the development of modern machine learning and natural language processing, some studies have attempted to use eye movement data to predict reading comprehension. Although some progress has been made, predictive modeling is still in its early stages. ### Contributions To advance research in this field, this paper makes the following contributions: 1. **Task Definition**: Introduces a new, challenging task—predicting a participant's understanding of a specific question from their eye movement data on a single paragraph. 2. **Model Development**: Proposes three new multimodal language models (RoBERTa-QEye, MAG-QEye, PostFusion-QEye) that combine text and eye movement data. 3. **Reading Scenarios**: Studies not only general reading but also the common yet less researched scenario of information retrieval. 4. **Evaluation**: Conducts systematic evaluations of the models, including generalization capabilities at different levels such as new participants, new text items, and combinations of both. ### Data and Methods The study uses the OneStop dataset, the largest eye movement dataset of native English speakers, containing 19,440 trials from 360 participants. Each participant reads a paragraph from one of 30 articles and answers a multiple-choice question. Participants are divided into two groups: one tested under general reading conditions and the other under information retrieval conditions. ### Model Architecture 1. **RoBERTa-QEye**: Incorporates eye movement data as an additional input sequence into the RoBERTa model, projecting eye movement features into the word embedding space through a fully connected layer. 2. **MAG-QEye**: Based on the MAG architecture, adjusts the hidden word representations of the transformer encoder using eye movement information. 3. **PostFusion-QEye**: Processes text and eye movement data separately and combines them through a cross-attention mechanism. ### Baseline Models The study also compares several baseline models, including logistic regression, CNN, BEyeLSTM, and Eyettention, which are primarily used for reading comprehension prediction or can be adapted for binary classification tasks. ### Experimental Setup The study evaluates model performance in three different generalization scenarios: 1. **New Participants**: No eye movement data for the given participants, but eye movement data for other participants on the same text items. 2. **New Text Items**: No eye movement data for the given text items, but eye movement data for participants on other text items. 3. **New Participants and New Text Items**: No eye movement data for the given participants or the given text items. ### Conclusion The results indicate that while eye movement data can provide better predictive performance than text-only baselines in some cases, the improvement is generally small. This may be due to the limitations of current modeling methods, data limitations, or the insufficiently close relationship between eye movement behavior and fine-grained reading comprehension processes. The study provides infrastructure for future advancements on this issue.