Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction

Akash Awasthi,Ngan Le,Zhigang Deng,Rishi Agrawal,Carol C. Wu,Hien Van Nguyen
2024-06-28
Abstract:Predicting human gaze behavior within computer vision is integral for developing interactive systems that can anticipate user attention, address fundamental questions in cognitive science, and hold implications for fields like human-computer interaction (HCI) and augmented/virtual reality (AR/VR) systems. Despite methodologies introduced for modeling human eye gaze behavior, applying these models to medical imaging for scanpath prediction remains unexplored. Our proposed system aims to predict eye gaze sequences from radiology reports and CXR images, potentially streamlining data collection and enhancing AI systems using larger datasets. However, predicting human scanpaths on medical images presents unique challenges due to the diverse nature of abnormal regions. Our model predicts fixation coordinates and durations critical for medical scanpath prediction, outperforming existing models in the computer vision community. Utilizing a two-stage training process and large publicly available datasets, our approach generates static heatmaps and eye gaze videos aligned with radiology reports, facilitating comprehensive analysis. We validate our approach by comparing its performance with state-of-the-art methods and assessing its generalizability among different radiologists, introducing novel strategies to model radiologists' search patterns during CXR image diagnosis. Based on the radiologist's evaluation, MedGaze can generate human-like gaze sequences with a high focus on relevant regions over the CXR images. It sometimes also outperforms humans in terms of redundancy and randomness in the scanpaths.
Image and Video Processing,Artificial Intelligence,Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to predict human scanpaths on radiological images. Specifically, the research aims to develop an AI system - MedGaze, which can simulate the cognitive process of radiologists and predict the scanpaths and fixation durations on chest X - rays (CXR). Through this technology, the attention - allocation patterns of radiologists during the diagnosis process can be better understood, thereby improving training standards and diagnostic accuracy. ### Research Background and Motivation 1. **Background**: - Predicting human gaze behavior is an important problem in computer vision and is of great significance for developing interactive systems that can predict user attention in advance. - Although there are existing methods for modeling gaze behavior on natural images, the prediction of scanpaths on radiological images is still an unexplored area. 2. **Motivation**: - In medical imaging, especially in chest X - rays (CXR), predicting scanpaths is crucial for improving diagnostic accuracy and efficiency. - By analyzing how expert radiologists browse these images, advanced training programs can be developed to help novices adopt effective viewing strategies, reduce errors, and improve diagnostic skills. ### Research Objectives - **Primary Objective**: Develop an AI system (MedGaze) that can simulate the cognitive process of radiologists and predict scanpaths and fixation durations on CXR images. - **Secondary Objectives**: - Improve training standardization. - Improve diagnostic accuracy. - Enhance human - machine collaboration. ### Method Overview - **Dataset**: Two public datasets, REFLACX and EGD - CXR, were used, which contain eye - tracking data of multiple radiologists and data of a single radiologist. - **Two - stage Training**: 1. **Vision - to - Radiology Report Learning (VR2)**: Representation learning was carried out using the MIMIC dataset to extract multi - modal features related to medicine. 2. **Vision - Language Cognitive Learning (VLC)**: Combined with large multi - modal models (LMMs) to generate human - like scanpaths and fixation durations. - **Evaluation Metrics**: Metrics such as IoU score, CC score, and Multimatch score were used to compare with the existing state - of - the - art methods. ### Main Contributions - **Performance Improvement**: MedGaze significantly outperforms the existing state - of - the - art methods on both the EGD - CXR and REFLACX datasets. - **Cross - dataset Generalization Ability**: MedGaze shows good generalization ability on datasets of different radiologists. - **Clinical Relevance**: Through the evaluation of expert radiologists, the scanpaths predicted by MedGaze perform well in terms of comprehensiveness and redundancy. In conclusion, this paper solves the problem of scanpath prediction on radiological images by developing the MedGaze system, providing new tools and techniques for improving the training quality and diagnostic accuracy of radiologists.