Multifocal region-assisted cross-modality learning for chest X-ray report generation

Jing Lian,Zilong Dong,Huaikun Zhang,Yuekai Chen,Jizhao Liu
DOI: https://doi.org/10.1016/j.compbiomed.2024.109187
Abstract:The prevalence of cardiovascular disease, tumors, and other chronic illnesses has been steadily rising in recent years. Researchers have recently been employing cross-modal large-scale models and natural language generation models to address the significant visual and textual disparities in medical report generation tasks. However, these training processes presents challenges, such as difficulties matching cross-modal information and generating specialized medical terminology. To tackle these issues, we propose a Multifocal Region-Assisted Report Generation Network (MRARGN) to enhance cross-modal information matching. Specifically, we integrate a pre-trained ResNet-50 with multi-channel and attention mechanisms for trainable X-ray image representation. We then combine our proposed memory response matrix with OpenAI's contrastive pre-training results to construct a dynamic knowledge graph that stores lesion features and their corresponding texts. Finally, we incorporate attention mechanisms and forget gate units to generate comprehensive textual descriptions for different lesions, using an image and report alignment loss. We conduct ablation experiments on the IU-Xray and MIMIC-CXR datasets to evaluate our approach. The experimental results demonstrate that our proposed MRARGN outperforms most state-of-the-art approaches, including their own variants.
What problem does this paper attempt to address?