Progressive Transformer-Based Generation of Radiology Reports

Farhad Nooralahzadeh,Nicolas Perez Gonzalez,Thomas Frauenfelder,Koji Fujimoto,Michael Krauthammer
DOI: https://doi.org/10.48550/arXiv.2102.09777
2021-09-01
Abstract:Inspired by Curriculum Learning, we propose a consecutive (i.e., image-to-text-to-text) generation framework where we divide the problem of radiology report generation into two steps. Contrary to generating the full radiology report from the image at once, the model generates global concepts from the image in the first step and then reforms them into finer and coherent texts using a transformer architecture. We follow the transformer-based sequence-to-sequence paradigm at each step. We improve upon the state-of-the-art on two benchmark datasets.
Computation and Language
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of automatic radiology report generation. Specifically, the authors propose a Progressive Transformer - Based Generation framework to improve the quality and efficiency of radiology report generation. #### Background and problem description In medical practice, X - ray analysis is one of the most common and important tasks for radiologists. After years of training, these experts can identify specific features in the images and transform them into written reports. However, this process is very time - consuming and labor - intensive, especially difficult for young residents. With the increasing demand for imaging examinations, the workload of radiologists is also increasing continuously, so technological means need to be introduced to improve their work processes. #### Limitations of existing methods Previous radiology report generation research has mainly focused on the task of directly generating text from images. For example: - Jing et al. (2018) introduced the co - attention mechanism to generate complete paragraphs. - Lovelace and Mortazavi (2020) explored the method of generating reports through Transformer. - Zhang et al. (2020) used pre - constructed graph embedding modules to assist in report generation. - Chen et al. (2020) proposed a memory - driven Transformer to generate radiology reports and demonstrated its superiority in language generation metrics and clinical evaluation. Although these methods have achieved certain success, they usually generate a complete report from an image in one step, which may lead to a lack of coherence and accuracy in the generated report. #### Innovations of the paper To solve the above problems, this paper proposes a phased generation framework, which divides the generation of radiology reports into two steps: 1. **Generate global concepts from images**: First, extract high - level concepts from X - ray images. 2. **Transform global concepts into detailed and coherent text**: Then use the Transformer architecture to refine these concepts into detailed and coherent text. This progressive generation strategy is inspired by Curriculum Learning. Through phased processing, the quality and accuracy of generated reports are gradually improved. #### Main contributions 1. **Propose a progressive text generation model**: By incorporating high - level concepts into the generation process, the effect of report generation is improved. 2. **Experimental results show that the model is superior to the baseline and other existing models**: On the IU X - RAY and MIMIC - CXR benchmark datasets, the average BLEU scores are increased by + 1.23% and + 3.2% F1 scores respectively. 3. **Conduct qualitative analysis**: Further demonstrate the quality and characteristics of the generated reports. In conclusion, this paper significantly improves the quality of radiology report generation by introducing a progressive generation framework, which helps to reduce the workload of radiologists and improve work efficiency.