Abstract:Radiology reports are crucial for planning treatment strategies and enhancing doctor-patient communication, yet manually writing these reports is burdensome for radiologists. While automatic report generation offers a solution, existing methods often rely on single-view radiographs, limiting diagnostic accuracy. To address this problem, we propose MCL, a Multi-view enhanced Contrastive Learning method for chest X-ray report generation. Specifically, we first introduce multi-view enhanced contrastive learning for visual representation by maximizing agreements between multi-view radiographs and their corresponding report. Subsequently, to fully exploit patient-specific indications (e.g., patient's symptoms) for report generation, we add a transitional ``bridge" for missing indications to reduce embedding space discrepancies caused by their presence or absence. Additionally, we construct Multi-view CXR and Two-view CXR datasets from public sources to support research on multi-view report generation. Our proposed MCL surpasses recent state-of-the-art methods across multiple datasets, achieving a 5.0% F1 RadGraph improvement on MIMIC-CXR, a 7.3% BLEU-1 improvement on MIMIC-ABN, a 3.1% BLEU-4 improvement on Multi-view CXR, and an 8.2% F1 CheXbert improvement on Two-view CXR.

What problem does this paper attempt to address?

This paper aims to solve several key problems in the automatic generation of chest X - ray reports: 1. **Single - view limitation**: Existing automatic report generation methods usually rely on single - view radiological images, which limits the accuracy of diagnosis. In actual clinical applications, multi - view imaging examinations (such as anteroposterior (PA), posteroanterior (AP) and lateral views) are crucial for accurate diagnosis and personalized treatment. Therefore, single - view methods cannot fully utilize the rich anatomical information provided by multi - view images and may lead to inaccurate and inconsistent reports. 2. **Utilization of patient - specific instructions**: When generating reports, the specific symptoms and other background information of patients are very important for improving the accuracy and relevance of reports. However, existing methods either directly ignore these instructions or fail to fully utilize them. The paper proposes a transitional "bridge" mechanism to reduce the embedding space differences caused by the presence or absence of instructions, so as to better capture the background information of patients. 3. **Processing of multi - view data**: The number of radiological images included in different studies is different, which makes it difficult to effectively use the multi - view images in the same study to enhance the clinical accuracy of the generated reports. By introducing the multi - view enhanced contrastive learning (MCL) method, the paper solves this problem and can handle different numbers of views. Specifically, the paper proposes the following solutions: - **Multi - view enhanced contrastive learning**: By maximizing the semantic consistency between multi - view radiological images and their corresponding reports, the learning effect of visual representation is improved. - **Multi - view fusion module**: Integrate different numbers of radiological images in each study to generate fused visual features for subsequent cross - modal alignment. - **Instance - level and token - level alignment losses**: Through contrastive learning, the instance - level and token - level alignments between multi - view radiological images and corresponding reports are achieved, maximizing the matching degree between them. - **Report generation based on patient - specific instructions**: Use the Transformer Decoder to fully utilize the available instruction information and reduce the embedding space differences caused by the presence or absence of instructions through the transitional "bridge" mechanism. Through these methods, the paper has achieved significant performance improvements on multiple datasets, especially on the MIMIC - CXR, MIMIC - ABN, Multi - view CXR and Two - view CXR datasets, achieving 5.0% F1 RadGraph improvement, 7.3% BLEU - 1 improvement, 3.1% BLEU - 4 improvement and 8.2% F1 mic - 14 CheXbert improvement respectively. These results indicate that MCL has significant advantages in generating clinically accurate chest X - ray reports.

MCL: Multi-view Enhanced Contrastive Learning for Chest X-ray Report Generation

MKCL: Medical Knowledge with Contrastive Learning model for radiology report generation

Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation

A Comparison of Maternal Interview and Medical Record Ascertainment of Violence among Women who had Poor Pregnancy Outcomes

Chest radiology report generation based on cross-modal multi-scale feature fusion

Radiology Report Generation via Structured Knowledge-Enhanced Multi-modal Attention and Contrastive Learning.

Medical Report Generation based on Segment-Enhanced Contrastive Representation Learning

Prediction of air pollutants by using an artificial neural network

Radiology Report Generation with a Learned Knowledge Base and Multi-Modal Alignment

Cross-modal Contrastive Attention Model for Medical Report Generation.

MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement

Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation

Primed Self-Construal, Culture, and Stages of impression Formation

M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation

Clinically Accurate Chest X-Ray Report Generation

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning

Contrastive Attention for Automatic Chest X-ray Report Generation.

Eye Gaze Guided Cross-Modal Alignment Network for Radiology Report Generation.

A Clinical Neurophysiology Information System based on Tcl/Tk

Integrating MedCLIP and Cross-Modal Fusion for Automatic Radiology Report Generation