Abstract:Chest radiology imaging plays a crucial role in the early screening, diagnosis, and treatment of chest diseases. The accurate interpretation of radiological images and the automatic generation of radiology reports not only save the doctor's time but also mitigate the risk of errors in diagnosis. The core objective of automatic radiology report generation is to achieve precise mapping of visual features and lesion descriptions at multi-scale and fine-grained levels. Existing methods typically combine global visual features and textual features to generate radiology reports. However, these approaches may ignore the key lesion areas and lack sensitivity to crucial lesion location information. Furthermore, achieving multi-scale characterization and fine-grained alignment of medical visual features and report text features proves challenging, leading to a reduction in the quality of radiology report generation. Addressing these issues, we propose a method for chest radiology report generation based on cross-modal multi-scale feature fusion. First, an auxiliary labeling module is designed to guide the model to focus on the lesion region of the radiological image. Second, a channel attention network is employed to enhance the characterization of location information and disease features. Finally, a cross-modal features fusion module is constructed by combining memory matrices, facilitating fine-grained alignment between multi-scale visual features and reporting text features on corresponding scales. The proposed method is experimentally evaluated on two publicly available radiological image datasets. The results demonstrate superior performance based on BLEU and ROUGE metrics compared to existing methods. Particularly, there are improvements of 4.8% in the ROUGE metric and 9.4% in the METEOR metric on the IU X-Ray dataset. Moreover, there is a 7.4% enhancement in BLEU-1 and a 7.6% improvement in the BLEU-2 on the MIMIC-CXR dataset.

CGFTrans: Cross-Modal Global Feature Fusion Transformer for Medical Report Generation

Automatic Report Generation Method Based on Multiscale Feature Extraction and Word Attention Network.

Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation

Dual-Modality Visual Feature Flow for Medical Report Generation

MACTFusion: Lightweight Cross Transformer for Adaptive Multimodal Medical Image Fusion

A label information fused medical image report generation framework

Chest radiology report generation based on cross-modal multi-scale feature fusion

AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

MATNet: Exploiting Multi-Modal Features for Radiology Report Generation.

A Medical Semantic-Assisted Transformer for Radiographic Report Generation

Auxiliary signal-guided knowledge encoder-decoder for medical report generation

Automated Radiographic Report Generation Purely on Transformer: A Multicriteria Supervised Approach

Multifocal region-assisted cross-modality learning for chest X-ray report generation

TSGET: Two-Stage Global Enhanced Transformer for Automatic Radiology Report Generation

Contrastive pre-training and linear interaction attention-based transformer for universal medical reports generation

CFATransUnet: Channel-wise cross fusion attention and transformer for 2D medical image segmentation

A medical report generation method integrating teacher–student model and encoder–decoder network

CT and MRI Image Fusion via Coupled Feature-Learning GAN

Transformer-Based End-to-End Anatomical and Functional Image Fusion

[Research on automatic generation of multimodal medical image reports based on memory driven]

MMTN: Multi-Modal Memory Transformer Network for Image-Report Consistent Medical Report Generation