Abstract:Radiology report generation aims to automatically generate a clinically accurate and coherent paragraph from the X-ray image, which could relieve radiologists from the heavy burden of report writing. Although various image caption methods have shown remarkable performance in the natural image field, generating accurate reports for medical images requires knowledge of multiple modalities, including vision, language, and medical terminology. We propose a Knowledge-injected U-Transformer (KiUT) to learn multi-level visual representation and adaptively distill the information with contextual and clinical knowledge for word prediction. In detail, a U-connection schema between the encoder and decoder is designed to model interactions between different modalities. And a symptom graph and an injected knowledge distiller are developed to assist the report generation. Experimentally, we outperform state-of-the-art methods on two widely used benchmark datasets: IU-Xray and MIMIC-CXR. Further experimental results prove the advantages of our architecture and the complementary benefits of the injected knowledge.

What problem does this paper attempt to address?

The paper aims to address the issue of automatic radiology report generation to alleviate the workload of radiologists in writing reports. Specifically, the goal of the research is to develop a method that can automatically generate accurate and coherent clinical reports from X-ray images. To tackle the aforementioned problem, the authors propose a new model called "Knowledge-injected U-Transformer" (KiUT). KiUT is designed to learn multi-level visual representations and adaptively refine information through context and clinical knowledge for word prediction. Its main contributions include: 1. **Proposing a new encoder-decoder architecture**: This architecture utilizes a U-connection scheme to fully exploit visual information at different levels, rather than relying solely on single-modal information input. Experiments show that this U-connection scheme not only improves radiology report generation but also enhances performance in natural image captioning tasks. 2. **Knowledge injection mechanism**: By constructing a symptom graph and combining it with visual and contextual information, clinical knowledge is injected at the final stage of decoding. To effectively integrate this knowledge, the paper further designs an Injected Knowledge Distiller to distill useful information from visual, contextual, and clinical knowledge. 3. **Development of a Region Relationship Encoder**: To extract features of abnormal regions, the study also develops a Region Relationship Encoder to recover both external and internal relationships between image regions. 4. **Empirical evaluation**: The method is evaluated on two widely used benchmark datasets (IU-Xray and MIMIC-CXR), and the results show that KiUT outperforms existing state-of-the-art methods on these datasets. In summary, this paper aims to improve the quality and accuracy of radiology report generation by introducing a novel encoder-decoder architecture and an effective knowledge injection strategy.

KiUT: Knowledge-injected U-Transformer for Radiology Report Generation

VMEKNet: Visual Memory and External Knowledge Based Network for Medical Report Generation.

Enhanced Knowledge Injection for Radiology Report Generation

An Inclusive Task-Aware Framework for Radiology Report Generation

Knowledge Matters: Radiology Report Generation with General and Specific Knowledge

Knowledge matters: Chest radiology report generation with general and specific knowledge

Generating Radiology Reports via Memory-driven Transformer

Radiology Report Generation with a Learned Knowledge Base and Multi-Modal Alignment

Automated Radiographic Report Generation Purely on Transformer: A Multicriteria Supervised Approach

Bridging the Gap: Cross-modal Knowledge Driven Network for Radiology Report Generation

Auxiliary signal-guided knowledge encoder-decoder for medical report generation

MATNet: Exploiting Multi-Modal Features for Radiology Report Generation.

Multi-modal transformer architecture for medical image analysis and automated report generation

When Radiology Report Generation Meets Knowledge Graph

Radiology Report Generation via Structured Knowledge-Enhanced Multi-modal Attention and Contrastive Learning.

Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation

CSAMDT: Conditional Self Attention Memory-Driven Transformers for Radiology Report Generation from Chest X-Ray

Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation

Boosting Radiology Report Generation by Infusing Comparison Prior

Clinical Context-aware Radiology Report Generation from Medical Images using Transformers

Automatic Generation of Chest X-ray Reports Using a Transformer-based Deep Learning Model