Large Language Model with Region-guided Referring and Grounding for CT Report Generation

Zhixuan Chen,Yequan Bie,Haibo Jin,Hao Chen

2024-11-23

Abstract:Computed tomography (CT) report generation is crucial to assist radiologists in interpreting CT volumes, which can be time-consuming and labor-intensive. Existing methods primarily only consider the global features of the entire volume, making it struggle to focus on specific regions and potentially missing abnormalities. To address this issue, we propose Reg2RG, the first region-guided referring and grounding framework for CT report generation, which enhances diagnostic performance by focusing on anatomical regions within the volume. Specifically, we utilize masks from a universal segmentation module to capture local features for each referring region. A local feature decoupling (LFD) strategy is proposed to preserve the local high-resolution details with little computational overhead. Then the local features are integrated with global features to capture inter-regional relationships within a cohesive context. Moreover, we propose a novel region-report alignment (RRA) training strategy. It leverages the recognition of referring regions to guide the generation of region-specific reports, enhancing the model's referring and grounding capabilities while also improving the report's interpretability. A large language model (LLM) is further employed as the language decoder to generate reports from integrated visual features, facilitating region-level comprehension. Extensive experiments on two large-scale chest CT-report datasets demonstrate the superiority of our method, which outperforms several state-of-the-art methods in terms of both natural language generation and clinical efficacy metrics while preserving promising interpretability. The code will be made publicly available.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that when generating CT reports, existing methods mainly rely on the global features of the entire volume, which makes it difficult for the model to focus on specific regions and may miss abnormal situations. To improve this situation, the paper proposes a region - guided reference and localization framework named Reg2RG, aiming to enhance the diagnostic performance by focusing on the anatomical regions within the CT volume. Specifically, the paper uses masks from the general segmentation module to capture the local features of each referential region and proposes a Local Feature Decoupling (LFD) strategy to reduce the computational cost while maintaining high - resolution local details. In addition, the paper also proposes a new Region - Report Alignment (RRA) training strategy. By identifying the referential regions, it guides the generation of region - specific reports, thereby improving the model's reference and localization capabilities and the interpretability of the reports at the same time. Experimental results show that this method outperforms existing methods on two large - scale chest CT report datasets, performs well in terms of natural language generation and clinical effectiveness indicators, and maintains good interpretability.

Large Language Model with Region-guided Referring and Grounding for CT Report Generation

Automatic Report Generation Method Based on Multiscale Feature Extraction and Word Attention Network.

An Inclusive Task-Aware Framework for Radiology Report Generation

Dia-LLaMA: Towards Large Language Model-driven CT Report Generation

Work like a doctor: Unifying scan localizer and dynamic generator for automated computed tomography report generation

3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models

Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation

TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model

Visual Grounding of Whole Radiology Reports for 3D CT Images

R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation

Multifocal region-assisted cross-modality learning for chest X-ray report generation

See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning

RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation

Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report Generation With Alternate Learning

Learning to Generate Radiology Findings from Impressions Based on Large Language Model

Resource-Efficient Medical Report Generation using Large Language Models

KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models

LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation

Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation

Prompt-Guided Generation of Structured Chest X-Ray Report Using a Pre-trained LLM