Semantic Image Synthesis for Abdominal CT

Yan Zhuang,Benjamin Hou,Tejas Sudharshan Mathai,Pritam Mukherjee,Boah Kim,Ronald M. Summers
2023-12-11
Abstract:As a new emerging and promising type of generative models, diffusion models have proven to outperform Generative Adversarial Networks (GANs) in multiple tasks, including image synthesis. In this work, we explore semantic image synthesis for abdominal CT using conditional diffusion models, which can be used for downstream applications such as data augmentation. We systematically evaluated the performance of three diffusion models, as well as to other state-of-the-art GAN-based approaches, and studied the different conditioning scenarios for the semantic mask. Experimental results demonstrated that diffusion models were able to synthesize abdominal CT images with better quality. Additionally, encoding the mask and the input separately is more effective than naïve concatenating.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is semantic image synthesis in abdominal CT images. Specifically, the authors explore the use of conditional diffusion models to generate abdominal CT images that match a given semantic segmentation mask. This task is of great significance in the field of medical imaging, especially in applications such as data augmentation, anonymization, and image editing. Through this research, the authors hope to evaluate the performance of diffusion models in generating high - quality abdominal CT images and compare them with existing methods based on generative adversarial networks (GANs). The main contributions of the paper are: 1. **Demonstrating the effectiveness of diffusion models**: The authors demonstrate the effectiveness of diffusion models in the semantic image synthesis task of abdominal CT images and provide a comprehensive comparison with the existing state - of - the - art GAN methods. 2. **Optimizing conditional configurations**: Through experiments, the authors find that encoding the mask and input separately can significantly improve the performance of the diffusion model, which provides new insights into how to use semantic mask information more effectively. ### Research Background Semantic image synthesis aims to generate realistic images from semantic segmentation masks. This field has a wide range of applications, including data augmentation, anonymization, and image editing. Although GANs have achieved remarkable results in image synthesis, recent studies have shown that diffusion models outperform GANs in multiple image synthesis tasks and can generate more realistic and high - fidelity images. ### Method The authors use the denoising diffusion probability model (DDPM) for conditional image generation. Specifically, they explore three different conditional configuration methods: 1. **Channel - wise Concatenation**: Connect the mask and the input image in an additional input channel. 2. **Mask - guided**: Use another U - Net encoder to encode the mask and inject the encoded information into the main U - Net branch. 3. **Edge - guided**: Combine the semantic edge map as auxiliary information. ### Experimental Results Through experiments on the AMOS22 dataset, the authors find that: - **Performance convergence**: The performance of all three proposed models tends to be stable after 150k iterations. - **The mask - guided model performs best**: On most metrics, the mask - guided diffusion model performs best after 150k iterations, especially showing a faster convergence rate in the early training stage (50k iterations). - **The effect of auxiliary information is limited**: Using the edge map as auxiliary information does not significantly improve the performance. ### Comparative Study Compared with existing GAN methods (such as Pix2Pix, OASIS, SPADE) and existing diffusion models (such as SDM), diffusion models perform better on image quality metrics (such as FID, PSNR, SSIM). However, in the correspondence of small organs and structures (such as DSC), GAN methods (especially OASIS) perform better. ### Conclusion The authors systematically study the application of diffusion models in abdominal CT image synthesis, and the experimental results show that diffusion models are superior to GAN methods in multiple settings. In addition, they also find that encoding the mask and input separately can significantly improve the performance of the diffusion model. ### Future Work Future work will focus on using diffusion models as a data augmentation strategy and evaluating their performance in downstream tasks (such as segmentation, classification, or detection). Although the sampling process of diffusion models is time - consuming, the authors plan to accelerate the inference process by reducing the denoising steps.