Abstract:The objective of Radiology Report Generation (RRG) is to automatically generate coherent textual analyses of diseases based on radiological images, thereby alleviating the workload of radiologists. Current AI-based methods for RRG primarily focus on modifications to the encoder-decoder model architecture. To advance these approaches, this paper introduces an Organ-Regional Information Driven (ORID) framework which can effectively integrate multi-modal information and reduce the influence of noise from unrelated organs. Specifically, based on the LLaVA-Med, we first construct an RRG-related instruction dataset to improve organ-regional diagnosis description ability and get the LLaVA-Med-RRG. After that, we propose an organ-based cross-modal fusion module to effectively combine the information from the organ-regional diagnosis description and radiology image. To further reduce the influence of noise from unrelated organs on the radiology report generation, we introduce an organ importance coefficient analysis module, which leverages Graph Neural Network (GNN) to examine the interconnections of the cross-modal information of each organ region. Extensive experiments an1d comparisons with state-of-the-art methods across various evaluation metrics demonstrate the superior performance of our proposed method.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to automatically generate accurate and reliable radiology reports to reduce the workload of radiologists. Specifically, the paper proposes a framework based on Organ - Regional Information Driven (ORID), aiming to improve the quality and accuracy of radiology report generation by effectively integrating multi - modal information and reducing the influence of irrelevant organ noise. ### Problem Background Automated generation of radiology reports (Radiology Report Generation, RRG) aims to automatically generate coherent disease - analysis texts based on radiology images, thereby reducing the workload of radiologists. However, existing AI methods mainly focus on improving the encoder - decoder model architecture and fail to fully integrate detailed organ - regional information, which is crucial in generating comprehensive and accurate radiology reports. ### Main Challenges 1. **Complexity and Diversity**: Radiology images usually emphasize specific small areas that are associated with diseases, so specialized methods are required to capture and describe these subtle differences. 2. **Noise Interference**: Information from irrelevant organs may introduce noise, affecting the accuracy and specificity of the report. 3. **Multi - modal Fusion**: There is a need to effectively fuse information from image and text modalities to generate high - quality reports. ### Solutions To address these challenges, the paper proposes the ORID framework, and its main contributions include: 1. **Constructing an RRG - related Instruction Dataset**: Based on LLaVA - Med, a dataset containing approximately 10,000 question - answer pairs covering 4,000 radiology images was constructed to improve the organ - regional diagnosis - description ability, and the LLaVA - Med - RRG model was developed. 2. **Organ - Regional Information - Driven Framework (ORID)**: It includes an organ - based cross - modal fusion module and an organ - importance - coefficient - analysis module, which can effectively integrate multi - modal information and reduce the influence of irrelevant organ noise. 3. **Experimental Verification**: Through extensive experiments and comparison with existing state - of - the - art methods, the superior performance of the proposed ORID framework on two publicly available radiology - report - generation benchmarks was demonstrated. ### Method Overview - **LLaVA - Med - RRG**: Enhance LLaVA - Med through instruction tuning to make it more suitable for processing radiology images. - **Organ - Regional Cross - Modal Fusion Module (OCF)**: Combine organ - regional image information and diagnosis - description features to generate fine - grained cross - modal features. - **Organ - Importance - Coefficient - Analysis Module (OICA)**: Use Graph Neural Network (GNN) to analyze the relationships between different organ regions and evaluate the importance coefficient of each organ region. - **Radiology Report Generation Module**: Use an encoder - decoder model to generate the final radiology report and introduce a consistency - constraint loss to align image features and report features. Through these methods, the ORID framework can generate radiology reports more accurately, especially in organ - level disease analysis.

ORID: Organ-Regional Information Driven Framework for Radiology Report Generation

An Organ-aware Diagnosis Framework for Radiology Report Generation

An Inclusive Task-Aware Framework for Radiology Report Generation

A Systematic Review of Deep Learning-based Research on Radiology Report Generation

Intensive Vision-guided Network for Radiology Report Generation

Visual-Linguistic Causal Intervention for Radiology Report Generation

Act Like a Radiologist: Radiology Report Generation across Anatomical Regions

ORGAN: Observation-Guided Radiology Report Generation via Tree Reasoning

Bootstrapping Large Language Models for Radiology Report Generation

Reinforced visual interaction fusion radiology report generation

Scene Graph Aided Radiology Report Generation

Multifocal region-assisted cross-modality learning for chest X-ray report generation

AutoRG-Brain: Grounded Report Generation for Brain MRI

A label information fused medical image report generation framework

Eye Gaze Guided Cross-Modal Alignment Network for Radiology Report Generation.

Anatomy-Guided Radiology Report Generation with Pathology-Aware Regional Prompts

Large Language Model with Region-guided Referring and Grounding for CT Report Generation

A Survey of Deep Learning-based Radiology Report Generation Using Multimodal Data

KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models

A Self-Guided Framework for Radiology Report Generation

Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation