Diff-CXR: Report-to-CXR generation through a disease-knowledge enhanced diffusion model

Peng Huang,Bowen Guo,Shuyu Liang,Junhu Fu,Yuanyuan Wang,Yi Guo
2024-10-26
Abstract:Text-To-Image (TTI) generation is significant for controlled and diverse image generation with broad potential applications. Although current medical TTI methods have made some progress in report-to-Chest-Xray (CXR) generation, their generation performance may be limited due to the intrinsic characteristics of medical data. In this paper, we propose a novel disease-knowledge enhanced Diffusion-based TTI learning framework, named Diff-CXR, for medical report-to-CXR generation. First, to minimize the negative impacts of noisy data on generation, we devise a Latent Noise Filtering Strategy that gradually learns the general patterns of anomalies and removes them in the latent space. Then, an Adaptive Vision-Aware Textual Learning Strategy is designed to learn concise and important report embeddings in a domain-specific Vision-Language Model, providing textual guidance for Chest-Xray generation. Finally, by incorporating the general disease knowledge into the pretrained TTI model via a delicate control adapter, a disease-knowledge enhanced diffusion model is introduced to achieve realistic and precise report-to-CXR generation. Experimentally, our Diff-CXR outperforms previous SOTA medical TTI methods by 33.4\% / 8.0\% and 23.8\% / 56.4\% in the FID and mAUC score on MIMIC-CXR and IU-Xray, with the lowest computational complexity at 29.641 GFLOPs. Downstream experiments on three thorax disease classification benchmarks and one CXR-report generation benchmark demonstrate that Diff-CXR is effective in improving classical CXR analysis methods. Notably, models trained on the combination of 1\% real data and synthetic data can achieve a competitive mAUC score compared to models trained on all data, presenting promising clinical applications.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy and authenticity of generating chest X - rays (CXR) from medical reports, especially in response to the challenges posed by the inherent characteristics of medical data. Specifically, current medical text - to - image (TTI) methods may be limited in the following aspects in terms of generation performance: 1. **Impact of noisy data**: Medical imaging datasets usually contain noisy samples, which may cause the generation model to over - fit these noise patterns, thus affecting the authenticity and accuracy of the generated images. 2. **Complexity of long - text processing**: In order to handle lengthy medical reports, domain - specific vision - language models (VLM) need to increase the maximum text token length, resulting in a significant increase in computational complexity. 3. **Importance of disease information**: Medical reports usually describe the patient's disease conditions and manifestations in detail, so it is crucial to maintain or emphasize the disease representation during the generation process. To solve these problems, the paper proposes a new disease - knowledge - enhanced diffusion model framework named Diff - CXR. The following are the main contributions of Diff - CXR: 1. **Latent - space noise filtering strategy (LNFS)**: - By gradually learning abnormal patterns and removing noisy data in the latent space, especially the ambiguous data near the decision boundary, to reduce the impact of noise on the generation results. 2. **Adaptive visual - aware text - learning strategy (AVA - TLS)**: - A adaptive visual - aware text - learning strategy is designed to enable domain - specific language models to explicitly model contextual relationships and dynamically learn concise and important report embeddings, providing text guidance for chest X - ray generation. 3. **Disease - knowledge - enhanced diffusion model**: - By finely controlling the adapter to inject general disease knowledge into the pre - trained TTI model, strengthen the disease representation in the text embedding, gradually improve the generation results, and achieve more realistic and accurate report - to - CXR generation. The experimental results show that Diff - CXR has increased the FID and mAUC scores by 33.4%/8.0% and 23.8%/56.4% respectively on two widely - used benchmark datasets, MIMIC - CXR and IU - Xray, and has the lowest computational complexity (29.641 GFLOPs). In addition, downstream experiments also prove the effectiveness and versatility of Diff - CXR in thoracic disease classification and CXR report generation tasks. In conclusion, this paper aims to overcome the challenges brought by the inherent characteristics of medical data by introducing a disease - knowledge - enhanced diffusion model, thereby significantly improving the quality and efficiency of generating chest X - rays from medical reports.