Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation

Junhyeok Lee,Yujin Oh,Dahyoun Lee,Hyon Keun Joh,Chul-Ho Sohn,Sung Hyun Baik,Cheol Kyu Jung,Jung Hyun Park,Kyu Sung Choi,Byung-Hoon Kim,Jong Chul Ye
2024-11-23
Abstract:Acute ischemic stroke (AIS) requires time-critical management, with hours of delayed intervention leading to an irreversible disability of the patient. Since diffusion weighted imaging (DWI) using the magnetic resonance image (MRI) plays a crucial role in the detection of AIS, automated prediction of AIS from DWI has been a research topic of clinical importance. While text radiology reports contain the most relevant clinical information from the image findings, the difficulty of mapping across different modalities has limited the factuality of conventional direct DWI-to-report generation methods. Here, we propose paired image-domain retrieval and text-domain augmentation (PIRTA), a cross-modal retrieval-augmented generation (RAG) framework for providing clinician-interpretative AIS radiology reports with improved factuality. PIRTA mitigates the need for learning cross-modal mapping, which poses difficulty in image-to-text generation, by casting the cross-modal mapping problem as an in-domain retrieval of similar DWI images that have paired ground-truth text radiology reports. By exploiting the retrieved radiology reports to augment the report generation process of the query image, we show by experiments with extensive in-house and public datasets that PIRTA can accurately retrieve relevant reports from 3D DWI images. This approach enables the generation of radiology reports with significantly higher accuracy compared to direct image-to-text generation using state-of-the-art multimodal language models.
Computer Vision and Pattern Recognition,Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Improve the accuracy of generating radiology reports from 3D brain MRI, especially for the diagnosis reports of acute ischemic stroke (AIS)**. Specifically, traditional methods of directly generating radiology reports from diffusion - weighted imaging (DWI) images have limitations in factual accuracy. This is because these methods need to learn cross - modal mapping (i.e., mapping from image to text), and this mapping is very difficult when establishing a joint distribution between image and text, especially when dealing with 3D medical images. To solve this problem, the authors propose a new framework, called **Paired Image - domain Retrieval and Text - domain Augmentation (PIRTA)**. PIRTA improves the generation of radiology reports in the following ways: 1. **Avoid the complexity of cross - modal mapping**: PIRTA transforms the cross - modal mapping problem into retrieving similar DWI images in the same domain and uses the ground - truth text reports corresponding to these images to enhance the report generation process. 2. **Improve the factual accuracy of generated reports**: By retrieving the images most similar to the query image and their corresponding radiology reports, PIRTA ensures that the generated reports are more accurate and clinically relevant. ### Formula Representation In traditional methods, two encoders are usually trained for cross - modal mapping: \[ f_{\text{image}}:X \to Z \] \[ f_{\text{text}}:Y \to Z \] where \(X\) represents the MRI image space, \(Y\) represents the radiology report space, and \(Z\) is a shared latent space. The goal is to minimize the distance between image and text representations: \[ d(f_{\text{image}}(x_q), f_{\text{text}}(y_j)) \] In the PIRTA framework, only one image encoder \(f_{\text{image}}:X \to Z\) is trained, and text generation is enhanced by retrieving similar images in the database: \[ d(f_{\text{image}}(x_q), f_{\text{image}}(x_i)) \] ### Experimental Verification The authors used multiple datasets for experiments, including internal datasets and external validation datasets. The experimental results show that PIRTA can retrieve relevant reports more accurately and generate higher - quality radiology reports. Through this method, PIRTA not only simplifies the learning task but also improves the accuracy and clinical practicality of generated reports, especially when dealing with 3D MRI data.