Automatic Report Generation Method Based on Multiscale Feature Extraction and Word Attention Network.
Xin Du,Haiwei Pan,Kejia Zhang,Shuning He,Xiaofei Bian,Weipeng Chen
DOI: https://doi.org/10.1007/978-3-031-25198-6_40
2023-01-01
Abstract:A medical report is a textual description of the information presented in a medical image, which includes detailed information about different body organs and the radiologist’s diagnosis from medical images. However, when summarizing the medical image content into a complete and accurate medical report, doctors usually face problems such as time-consuming and repetitive work. Although there are many studies in the field of automatic medical report generation, a lot of challenges still exist. First, when describing multiple organs and lesions presented in medical images, the generated report based on the single-scale feature extraction method is still inadequate and inaccurate. Second, when generating reports, most existing methods encounter problems such as duplicate words or lack of key descriptions. To solve the problems mentioned above, we propose Multiscale Feature Extraction and Word Attention Network (MFWAN) which is an automatic medical report generation model. The model contains three modules. In order to focus on abnormalities in different regions, the model includes the EPSA (Efficient Pyramid Split Attention) Multiscale Feature Extraction module which utilizes spatial information at different scales of medical images. After that, the visual features are classified by a Multi-Classification Context Generation Module to generate context messages. Then, by assigning different weights to the hidden layers of word LSTM, the Word-Attention-Based Report Generation module generates more accurate words with implicit disease critical information. Experimental results on benchmark datasets, IU X-Ray, show that our proposed MFWAN outperforms previous works and generates more accurate reports.