TSGET: Two-Stage Global Enhanced Transformer for Automatic Radiology Report Generation

Xiulong Yi,You Fu,Ruiqing Liu,Hao Zhang,Rong Hua
DOI: https://doi.org/10.1109/jbhi.2024.3350077
IF: 7.7
2024-01-01
IEEE Journal of Biomedical and Health Informatics
Abstract:Recently, automatic radiology report generation, which targets to generate multiple sentences that can accurately describe medical observations for given X-ray images, has gained increasing attention. Existing methods commonly employ the attention mechanism for accurate word generation. However, such attention-based methods fail to leverage useful image-level global features, thereby limiting the model's reasoning ability. To tackle this challenge, we propose two-stage global enhancement layers to facilitate the Transformer to generate more reliable reports from a global perspective. Specifically, the $1^{st}$ Global Enhancement Layer ($1^{st}$ GEL) is designed to capture the global visual context features by establishing the relationships between image-level global features and previously generated words. The $2^{nd}$ Global Enhancement Layer ($2^{nd}$ GEL) is devised to capture the region-global level features by building the relationships between image-level global features and region-level information. The experiments demonstrate that by integrating the aforementioned two-stage global enhancement layers into the Transformer model, our proposal achieves state-of-the-art (SOTA) performance on various Natural Language Generation (NLG) evaluation metrics. Further Clinical Efficacy (CE) evaluations also validate that our proposal is able to predict more critical information. The Code will be available at https://github.com/SKD-HPC/TSGET.
computer science, interdisciplinary applications,mathematical & computational biology,medical informatics, information systems
What problem does this paper attempt to address?