MCL: Multi-view Enhanced Contrastive Learning for Chest X-ray Report Generation

Kang Liu,Zhuoqi Ma,Kun Xie,Zhicheng Jiao,Qiguang Miao
2024-11-15
Abstract:Radiology reports are crucial for planning treatment strategies and enhancing doctor-patient communication, yet manually writing these reports is burdensome for radiologists. While automatic report generation offers a solution, existing methods often rely on single-view radiographs, limiting diagnostic accuracy. To address this problem, we propose MCL, a Multi-view enhanced Contrastive Learning method for chest X-ray report generation. Specifically, we first introduce multi-view enhanced contrastive learning for visual representation by maximizing agreements between multi-view radiographs and their corresponding report. Subsequently, to fully exploit patient-specific indications (e.g., patient's symptoms) for report generation, we add a transitional ``bridge" for missing indications to reduce embedding space discrepancies caused by their presence or absence. Additionally, we construct Multi-view CXR and Two-view CXR datasets from public sources to support research on multi-view report generation. Our proposed MCL surpasses recent state-of-the-art methods across multiple datasets, achieving a 5.0% F1 RadGraph improvement on MIMIC-CXR, a 7.3% BLEU-1 improvement on MIMIC-ABN, a 3.1% BLEU-4 improvement on Multi-view CXR, and an 8.2% F1 CheXbert improvement on Two-view CXR.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
This paper aims to solve several key problems in the automatic generation of chest X - ray reports: 1. **Single - view limitation**: Existing automatic report generation methods usually rely on single - view radiological images, which limits the accuracy of diagnosis. In actual clinical applications, multi - view imaging examinations (such as anteroposterior (PA), posteroanterior (AP) and lateral views) are crucial for accurate diagnosis and personalized treatment. Therefore, single - view methods cannot fully utilize the rich anatomical information provided by multi - view images and may lead to inaccurate and inconsistent reports. 2. **Utilization of patient - specific instructions**: When generating reports, the specific symptoms and other background information of patients are very important for improving the accuracy and relevance of reports. However, existing methods either directly ignore these instructions or fail to fully utilize them. The paper proposes a transitional "bridge" mechanism to reduce the embedding space differences caused by the presence or absence of instructions, so as to better capture the background information of patients. 3. **Processing of multi - view data**: The number of radiological images included in different studies is different, which makes it difficult to effectively use the multi - view images in the same study to enhance the clinical accuracy of the generated reports. By introducing the multi - view enhanced contrastive learning (MCL) method, the paper solves this problem and can handle different numbers of views. Specifically, the paper proposes the following solutions: - **Multi - view enhanced contrastive learning**: By maximizing the semantic consistency between multi - view radiological images and their corresponding reports, the learning effect of visual representation is improved. - **Multi - view fusion module**: Integrate different numbers of radiological images in each study to generate fused visual features for subsequent cross - modal alignment. - **Instance - level and token - level alignment losses**: Through contrastive learning, the instance - level and token - level alignments between multi - view radiological images and corresponding reports are achieved, maximizing the matching degree between them. - **Report generation based on patient - specific instructions**: Use the Transformer Decoder to fully utilize the available instruction information and reduce the embedding space differences caused by the presence or absence of instructions through the transitional "bridge" mechanism. Through these methods, the paper has achieved significant performance improvements on multiple datasets, especially on the MIMIC - CXR, MIMIC - ABN, Multi - view CXR and Two - view CXR datasets, achieving 5.0% F1 RadGraph improvement, 7.3% BLEU - 1 improvement, 3.1% BLEU - 4 improvement and 8.2% F1 mic - 14 CheXbert improvement respectively. These results indicate that MCL has significant advantages in generating clinically accurate chest X - ray reports.