A Survey of Deep Learning-based Radiology Report Generation Using Multimodal Data

Xinyi Wang,Grazziela Figueredo,Ruizhe Li,Wei Emma Zhang,Weitong Chen,Xin Chen
2024-05-21
Abstract:Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources, therefore becoming an important topic in the medical image analysis field. It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data (i.e., medical images, clinical information, medical knowledge, etc.), and produce comprehensive and accurate reports. Recently, numerous works emerged to address this issue using deep learning-based methods, such as transformers, contrastive learning, and knowledge-base construction. This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep learning-based report generation with five main components, including multi-modality data acquisition, data preparation, feature learning, feature fusion/interaction, and report generation. The state-of-the-art methods for each of these components are highlighted. Additionally, training strategies, public datasets, evaluation methods, current challenges, and future directions in this field are summarized. We have also conducted a quantitative comparison between different methods under the same experimental setting. This is the most up-to-date survey that focuses on multi-modality inputs and data fusion for radiology report generation. The aim is to provide comprehensive and rich information for researchers interested in automatic clinical report generation and medical image analysis, especially when using multimodal inputs, and assist them in developing new algorithms to advance the field.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of automatic radiology report generation. Specifically, it focuses on how to utilize multimodal data (such as medical images, clinical information, medical knowledge, etc.) to generate comprehensive and accurate radiology reports. This issue is very important because: 1. **Reducing Doctors' Workload**: Manually writing radiology reports is a labor-intensive, time-consuming task that requires highly specialized knowledge. Automatic report generation can significantly reduce doctors' workload. 2. **Narrowing Regional Disparities in Medical Resources**: In some regions, insufficient medical resources lead to long patient wait times and increased risk of disease spread. Automatic report generation technology can help alleviate this problem. 3. **Improving Report Quality and Consistency**: Automatic report generation can ensure the consistency and accuracy of reports, reducing human errors. ### Background and Challenges Generating high-quality reports automatically is a challenging task because it is inherently a multimodal problem. In routine clinical practice, radiologists need to combine image information with other modal data (such as medical history and relevant clinical indicators) to generate clear, correct, concise, complete, consistent, and coherent reports. However, most existing technologies primarily consider images as input. In recent years, multimodal deep learning technology has rapidly developed, and more and more studies are attempting to use multimodal data to generate diagnostic reports. ### Main Contributions of the Paper 1. **Analyzed an additional 22 papers using non-image inputs**: This is the first comprehensive survey of the latest technologies using multimodal inputs for report generation. 2. **Covered 89 papers from 2021 to 2024**: Provided the latest advancements in the field of automatic report generation. 3. **Proposed a general workflow**: Includes five key components: multimodal data acquisition, data preparation, feature learning, feature fusion/interaction, and report generation. Additionally, it summarizes training strategies, public datasets, and mainstream evaluation methods. ### Conclusion This review paper provides researchers with a wealth of information, especially those interested in automatic clinical report generation and medical image analysis using multimodal inputs. It not only summarizes the latest technological advancements but also points out current challenges and future development directions. With this information, researchers can better develop new algorithms and advance the field.