Dual-Modality Visual Feature Flow for Medical Report Generation

Quan Tang,Liming Xu,Yongheng Wang,Bochuan Zheng,Jiancheng Lv,Xianhua Zeng,Weisheng Li
DOI: https://doi.org/10.1016/j.media.2024.103413
IF: 10.9
2024-12-04
Medical Image Analysis
Abstract:Medical report generation, a cross-modal task of generating medical text information, aiming to provide professional descriptions of medical images in clinical language. Despite some methods have made progress, there are still some limitations, including insufficient focus on lesion areas, omission of internal edge features, and difficulty in aligning cross-modal data. To address these issues, we propose Dual-Modality Visual Feature Flow (DMVF) for medical report generation. Firstly, we introduce region-level features based on grid-level features to enhance the method's ability to identify lesions and key areas. Then, we enhance two types of feature flows based on their attributes to prevent the loss of key information, respectively. Finally, we align visual mappings from different visual feature with report textual embeddings through a feature fusion module to perform cross-modal learning. Extensive experiments conducted on three benchmark datasets demonstrate that our approach outperforms the state-of-the-art methods in both natural language generation and clinical efficacy metrics.
engineering, biomedical,computer science, interdisciplinary applications, artificial intelligence,radiology, nuclear medicine & medical imaging
What problem does this paper attempt to address?