Abstract:In the rapidly evolving landscape of medical imaging, the integration of artificial intelligence (AI) with clinical expertise offers unprecedented opportunities to enhance diagnostic precision and accuracy. Yet, the "black box" nature of AI models often limits their integration into clinical practice, where transparency and interpretability are important. This paper presents a novel system leveraging the Large Multimodal Model (LMM) to bridge the gap between AI predictions and the cognitive processes of radiologists. This system consists of two core modules, Temporally Grounded Intention Detection (TGID) and Region Extraction (RE). The TGID module predicts the radiologist's intentions by analyzing eye gaze fixation heatmap videos and corresponding radiology reports. Additionally, the RE module extracts regions of interest that align with these intentions, mirroring the radiologist's diagnostic focus. This approach introduces a new task, radiologist intention detection, and is the first application of Dense Video Captioning (DVC) in the medical domain. By making AI systems more interpretable and aligned with radiologist's cognitive processes, this proposed system aims to enhance trust, improve diagnostic accuracy, and support medical education. Additionally, it holds the potential for automated error correction, guiding junior radiologists, and fostering more effective training and feedback mechanisms. This work sets a precedent for future research in AI-driven healthcare, offering a pathway towards transparent, trustworthy, and human-centered AI systems. We evaluated this model using NLG(Natural Language Generation), time-related, and vision-based metrics, demonstrating superior performance in generating temporally grounded intentions on REFLACX and EGD-CXR datasets. This model also demonstrated strong predictive accuracy in overlap scores for medical abnormalities and effective region extraction with high IoU(Intersection over Union), especially in complex cases like cardiomegaly and edema. These results highlight the system's potential to enhance diagnostic accuracy and support continuous learning in radiology. We are also releasing the source code for our project, available here. Graphical abstract Download: Download high-res image (138KB) Download: Download full-size image Overview of our proposed system, comprising two key submodules: Temporally Grounded Intention Detection (TGID) and Region Extraction (RE). The system processes eye gaze fixation video overlaid on CXR images alongside the corresponding radiology report, ultimately identifying the intended diagnosis and highlighting the associated Regions of Interest (ROI).

ReXTrust: A Model for Fine-Grained Hallucination Detection in AI-Generated Radiology Reports

ReXErr: Synthesizing Clinically Meaningful Errors in Diagnostic Radiology Reports

FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

On the notion of Hallucinations from the lens of Bias and Validity in Synthetic CXR Images

MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models

ReXamine-Global: A Framework for Uncovering Inconsistencies in Radiology Report Generation Metrics

RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models

ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation

ESREAL: Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models

Bridging Human and Machine Intelligence: Reverse-Engineering Radiologist Intentions for Clinical Trust and Adoption

Zero-Resource Hallucination Prevention for Large Language Models

Fact-Checking of AI-Generated Reports

Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs

Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation

ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability

CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting

Detecting Hallucinations in Virtual Histology with Neural Precursors

Zero-Shot Multi-task Hallucination Detection

FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning

ReXplain: Translating Radiology into Patient-Friendly Video Reports

On Early Detection of Hallucinations in Factual Question Answering