Simulating Doctors' Thinking Logic for Chest X-ray Report Generation Via Transformer-based Semantic Query Learning.

Danyang Gao,Ming Kong,Yongrui Zhao,Jing Huang,Zhengxing Huang,Kun Kuang,Fei Wu,Qiang Zhu
DOI: https://doi.org/10.1016/j.media.2023.102982
IF: 10.9
2024-01-01
Medical Image Analysis
Abstract:Medical report generation can be treated as a process of doctors' observing, understanding, and describing images from different perspectives. Following this process, this paper innovatively proposes a Transformer-based Semantic Query learning paradigm (TranSQ). Briefly, this paradigm is to learn an intention embedding set and make a semantic query to the visual features, generate intent-compliant sentence candidates, and form a coherent report. We apply a bipartite matching mechanism during training to realize the dynamic correspondence between the intention embeddings and the sentences to induct medical concepts into the observation intentions. Experimental results on two major radiology reporting datasets (i.e., IU X-ray and MIMIC-CXR) demonstrate that our model outperforms state-of-the-art models regarding generation effectiveness and clinical efficacy. In addition, comprehensive ablation experiments fully validate the TranSQ model's innovation and interpretation. The code is available at https://github.com/zjukongming/TranSQ.
What problem does this paper attempt to address?