Abstract:In the rapidly evolving landscape of medical imaging, the integration of artificial intelligence (AI) with clinical expertise offers unprecedented opportunities to enhance diagnostic precision and accuracy. Yet, the "black box" nature of AI models often limits their integration into clinical practice, where transparency and interpretability are important. This paper presents a novel system leveraging the Large Multimodal Model (LMM) to bridge the gap between AI predictions and the cognitive processes of radiologists. This system consists of two core modules, Temporally Grounded Intention Detection (TGID) and Region Extraction (RE). The TGID module predicts the radiologist's intentions by analyzing eye gaze fixation heatmap videos and corresponding radiology reports. Additionally, the RE module extracts regions of interest that align with these intentions, mirroring the radiologist's diagnostic focus. This approach introduces a new task, radiologist intention detection, and is the first application of Dense Video Captioning (DVC) in the medical domain. By making AI systems more interpretable and aligned with radiologist's cognitive processes, this proposed system aims to enhance trust, improve diagnostic accuracy, and support medical education. Additionally, it holds the potential for automated error correction, guiding junior radiologists, and fostering more effective training and feedback mechanisms. This work sets a precedent for future research in AI-driven healthcare, offering a pathway towards transparent, trustworthy, and human-centered AI systems. We evaluated this model using NLG(Natural Language Generation), time-related, and vision-based metrics, demonstrating superior performance in generating temporally grounded intentions on REFLACX and EGD-CXR datasets. This model also demonstrated strong predictive accuracy in overlap scores for medical abnormalities and effective region extraction with high IoU(Intersection over Union), especially in complex cases like cardiomegaly and edema. These results highlight the system's potential to enhance diagnostic accuracy and support continuous learning in radiology. We are also releasing the source code for our project, available here. Graphical abstract Download: Download high-res image (138KB) Download: Download full-size image Overview of our proposed system, comprising two key submodules: Temporally Grounded Intention Detection (TGID) and Region Extraction (RE). The system processes eye gaze fixation video overlaid on CXR images alongside the corresponding radiology report, ultimately identifying the intended diagnosis and highlighting the associated Regions of Interest (ROI).

Discovery Viewer (DV): Web-Based Medical AI Model Development Platform and Deployment Hub

MeDaS: An open-source platform as service to help break the walls between medicine and informatics

OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models in Medicine

No-code machine learning in radiology: implementation and validation of a platform that allows clinicians to train their own models

MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging

MONAI: An open-source framework for deep learning in healthcare

Health AI Developer Foundations

Bridging Human and Machine Intelligence: Reverse-Engineering Radiologist Intentions for Clinical Trust and Adoption

A Methodology for a Scalable, Collaborative, and Resource-Efficient Platform to Facilitate Healthcare AI Research

MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated Evaluation

Tesseract-medical imaging: open-source browser-based platform for artificial intelligence deployment in medical imaging

Federated benchmarking of medical artificial intelligence with MedPerf

Explainable, Domain-Adaptive, and Federated Artificial Intelligence in Medicine

Developing Medical AI : a cloud-native audio-visual data collection study

Towards Democratization of Subspeciality Medical Expertise

All-in-one platform for AI R&D in medical imaging, encompassing data collection, selection, annotation, and pre-processing

DeepMediX: A Deep Learning-Driven Resource-Efficient Medical Diagnosis Across the Spectrum

2D medical image synthesis using transformer-based denoising diffusion probabilistic model

Generative AI for Medical Imaging: extending the MONAI Framework

Coupling AI and Citizen Science in Creation of Enhanced Training Dataset for Medical Image Segmentation

VAI-B: a multicenter platform for the external validation of artificial intelligence algorithms in breast imaging