PhraseAug: an Augmented Medical Report Generation Model with Phrasebook

Xin Mei,Libin Yang,Denghong Gao,Xiaoyan Cai,Junwei Han,Tianming Liu
DOI: https://doi.org/10.1109/tmi.2024.3416190
IF: 10.6
2024-01-01
IEEE Transactions on Medical Imaging
Abstract:Medical report generation is a valuable and challenging task, which automatically generates accurate and fluent diagnostic reports for medical images, reducing workload of radiologists and improving efficiency of disease diagnosis. Fine-grained alignment of medical images and reports facilitates the exploration of close correlations between images and texts, which is crucial for cross-modal generation. However, visual and linguistic biases caused by radiologists’ writing styles make cross-modal image-text alignment difficult. To alleviate visual-linguistic bias, this paper discretizes medical reports and introduces an intermediate modality, i.e. phrasebook, consisting of key noun phrases. As discretized representation of medical reports, phrasebook contains both disease-related medical terms, and synonymous phrases representing different writing styles which can identify synonymous sentences, thereby promoting fine-grained alignment between images and reports. In this paper, an augmented two-stage medical report generation model with phrasebook (PhraseAug) is developed, which combines medical images, clinical histories and writing styles to generate diagnostic reports. In the first stage, phrasebook is used to extract semantically relevant important features and predict key phrases contained in the report. In the second stage, medical reports are generated according to the predicted key phrases which contain synonymous phrases, promoting our model to adapt to different writing styles and generating diverse medical reports. Experimental results on two public datasets, IU-Xray and MIMIC-CXR, demonstrate that our proposed PhraseAug outperforms state-of-the-art baselines.
What problem does this paper attempt to address?