KiUT: Knowledge-injected U-Transformer for Radiology Report Generation

Zhongzhen Huang,Xiaofan Zhang,Shaoting Zhang
2023-06-20
Abstract:Radiology report generation aims to automatically generate a clinically accurate and coherent paragraph from the X-ray image, which could relieve radiologists from the heavy burden of report writing. Although various image caption methods have shown remarkable performance in the natural image field, generating accurate reports for medical images requires knowledge of multiple modalities, including vision, language, and medical terminology. We propose a Knowledge-injected U-Transformer (KiUT) to learn multi-level visual representation and adaptively distill the information with contextual and clinical knowledge for word prediction. In detail, a U-connection schema between the encoder and decoder is designed to model interactions between different modalities. And a symptom graph and an injected knowledge distiller are developed to assist the report generation. Experimentally, we outperform state-of-the-art methods on two widely used benchmark datasets: IU-Xray and MIMIC-CXR. Further experimental results prove the advantages of our architecture and the complementary benefits of the injected knowledge.
Computer Vision and Pattern Recognition,Computation and Language
What problem does this paper attempt to address?
The paper aims to address the issue of automatic radiology report generation to alleviate the workload of radiologists in writing reports. Specifically, the goal of the research is to develop a method that can automatically generate accurate and coherent clinical reports from X-ray images. To tackle the aforementioned problem, the authors propose a new model called "Knowledge-injected U-Transformer" (KiUT). KiUT is designed to learn multi-level visual representations and adaptively refine information through context and clinical knowledge for word prediction. Its main contributions include: 1. **Proposing a new encoder-decoder architecture**: This architecture utilizes a U-connection scheme to fully exploit visual information at different levels, rather than relying solely on single-modal information input. Experiments show that this U-connection scheme not only improves radiology report generation but also enhances performance in natural image captioning tasks. 2. **Knowledge injection mechanism**: By constructing a symptom graph and combining it with visual and contextual information, clinical knowledge is injected at the final stage of decoding. To effectively integrate this knowledge, the paper further designs an Injected Knowledge Distiller to distill useful information from visual, contextual, and clinical knowledge. 3. **Development of a Region Relationship Encoder**: To extract features of abnormal regions, the study also develops a Region Relationship Encoder to recover both external and internal relationships between image regions. 4. **Empirical evaluation**: The method is evaluated on two widely used benchmark datasets (IU-Xray and MIMIC-CXR), and the results show that KiUT outperforms existing state-of-the-art methods on these datasets. In summary, this paper aims to improve the quality and accuracy of radiology report generation by introducing a novel encoder-decoder architecture and an effective knowledge injection strategy.