Abstract:Acute ischemic stroke (AIS) requires time-critical management, with hours of delayed intervention leading to an irreversible disability of the patient. Since diffusion weighted imaging (DWI) using the magnetic resonance image (MRI) plays a crucial role in the detection of AIS, automated prediction of AIS from DWI has been a research topic of clinical importance. While text radiology reports contain the most relevant clinical information from the image findings, the difficulty of mapping across different modalities has limited the factuality of conventional direct DWI-to-report generation methods. Here, we propose paired image-domain retrieval and text-domain augmentation (PIRTA), a cross-modal retrieval-augmented generation (RAG) framework for providing clinician-interpretative AIS radiology reports with improved factuality. PIRTA mitigates the need for learning cross-modal mapping, which poses difficulty in image-to-text generation, by casting the cross-modal mapping problem as an in-domain retrieval of similar DWI images that have paired ground-truth text radiology reports. By exploiting the retrieved radiology reports to augment the report generation process of the query image, we show by experiments with extensive in-house and public datasets that PIRTA can accurately retrieve relevant reports from 3D DWI images. This approach enables the generation of radiology reports with significantly higher accuracy compared to direct image-to-text generation using state-of-the-art multimodal language models.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **Improve the accuracy of generating radiology reports from 3D brain MRI, especially for the diagnosis reports of acute ischemic stroke (AIS)**. Specifically, traditional methods of directly generating radiology reports from diffusion - weighted imaging (DWI) images have limitations in factual accuracy. This is because these methods need to learn cross - modal mapping (i.e., mapping from image to text), and this mapping is very difficult when establishing a joint distribution between image and text, especially when dealing with 3D medical images. To solve this problem, the authors propose a new framework, called **Paired Image - domain Retrieval and Text - domain Augmentation (PIRTA)**. PIRTA improves the generation of radiology reports in the following ways: 1. **Avoid the complexity of cross - modal mapping**: PIRTA transforms the cross - modal mapping problem into retrieving similar DWI images in the same domain and uses the ground - truth text reports corresponding to these images to enhance the report generation process. 2. **Improve the factual accuracy of generated reports**: By retrieving the images most similar to the query image and their corresponding radiology reports, PIRTA ensures that the generated reports are more accurate and clinically relevant. ### Formula Representation In traditional methods, two encoders are usually trained for cross - modal mapping: \[ f_{\text{image}}:X \to Z \] \[ f_{\text{text}}:Y \to Z \] where \(X\) represents the MRI image space, \(Y\) represents the radiology report space, and \(Z\) is a shared latent space. The goal is to minimize the distance between image and text representations: \[ d(f_{\text{image}}(x_q), f_{\text{text}}(y_j)) \] In the PIRTA framework, only one image encoder \(f_{\text{image}}:X \to Z\) is trained, and text generation is enhanced by retrieving similar images in the database: \[ d(f_{\text{image}}(x_q), f_{\text{image}}(x_i)) \] ### Experimental Verification The authors used multiple datasets for experiments, including internal datasets and external validation datasets. The experimental results show that PIRTA can retrieve relevant reports more accurately and generate higher - quality radiology reports. Through this method, PIRTA not only simplifies the learning task but also improves the accuracy and clinical practicality of generated reports, especially when dealing with 3D MRI data.

Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation

Spatial Semantic-Preserving Latent Space Learning for Accelerated DWI Diagnostic Report Generation

An Inclusive Task-Aware Framework for Radiology Report Generation

A Semi-Supervised Learning Framework to Leverage Proxy Information for Stroke MRI Analysis

Natural Language Processing of Radiology Reports to Detect Complications of Ischemic Stroke

Enhancing clinical MRI Perfusion maps with data-driven maps of complementary nature for lesion outcome prediction

Framework to generate perfusion map from CT and CTA images in patients with acute ischemic stroke: A longitudinal and cross-sectional study

AutoRG-Brain: Grounded Report Generation for Brain MRI

Identifying stroke diagnosis-related features from medical imaging reports to improve clinical decision-making support

Deep into the Brain: Artificial Intelligence in Stroke Imaging

Deep Learning-Based High-Resolution Magnetic Resonance Angiography (MRA) Generation Model for 4D Time-Resolved Angiography with Interleaved Stochastic Trajectories (TWIST) MRA in Fast Stroke Imaging

Perfusion parameter map generation from TOF-MRA in stroke using generative adversarial networks

Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study

MRI Generated From CT for Acute Ischemic Stroke Combining Radiomics and Generative Adversarial Networks

Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation

Automated Generation of Radiologic Descriptions on Brain Volume Changes from T1-Weighted MR Images: Initial Assessment of Feasibility

Leveraging Spatial Information in Radiology Reports for Ischemic Stroke Phenotyping

Brain MRI-to-PET Synthesis using 3D Convolutional Attention Networks

A Radiomic-based Method for Predicting the Prognosis of Ischemic Stroke from Diffusion-weighted Imaging Images

Perfusion Maps Acquired From Dynamic Angiography MRI Using Deep Learning Approaches

Disease-oriented image embedding with pseudo-scanner standardization for content-based image retrieval on 3D brain MRI