Denoising Diffusions in Latent Space for Medical Image Segmentation

Fahim Ahmed Zaman,Mathews Jacob,Amanda Chang,Kan Liu,Milan Sonka,Xiaodong Wu
2024-07-18
Abstract:Diffusion models (DPMs) have demonstrated remarkable performance in image generation, often times outperforming other generative models. Since their introduction, the powerful noise-to-image denoising pipeline has been extended to various discriminative tasks, including image segmentation. In case of medical imaging, often times the images are large 3D scans, where segmenting one image using DPMs become extremely inefficient due to large memory consumption and time consuming iterative sampling process. In this work, we propose a novel conditional generative modeling framework (LDSeg) that performs diffusion in latent space for medical image segmentation. Our proposed framework leverages the learned inherent low-dimensional latent distribution of the target object shapes and source image embeddings. The conditional diffusion in latent space not only ensures accurate n-D image segmentation for multi-label objects, but also mitigates the major underlying problems of the traditional DPM based segmentation: (1) large memory consumption, (2) time consuming sampling process and (3) unnatural noise injection in forward/reverse process. LDSeg achieved state-of-the-art segmentation accuracy on three medical image datasets with different imaging modalities. Furthermore, we show that our proposed model is significantly more robust to noises, compared to the traditional deterministic segmentation models, which can be potential in solving the domain shift problems in the medical imaging domain. Codes are available at: <a class="link-external link-https" href="https://github.com/LDSeg/LDSeg" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are several key challenges of existing Diffusion Probability Models (DPMs) in medical image segmentation: 1. **High memory consumption**: Since medical image datasets usually contain large 3D scans, a large amount of memory will be consumed when using DPMs for image segmentation. 2. **Time - consuming sampling process**: Traditional DPMs require a large number of iterative sampling steps when generating high - quality segmentation results, which makes the whole process very time - consuming. 3. **Unnatural noise injection**: Directly adding Gaussian noise to the segmentation labels will lead to unnatural distribution distortion, especially in multi - class object segmentation, which makes it difficult to train the denoiser. To address these challenges, the authors propose a new conditional generative modeling framework (LDSeg), which achieves medical image segmentation by diffusion in the latent space. Specifically, LDSeg utilizes the low - dimensional latent distributions of the target object shape and the source image embedding, thereby ensuring the accuracy of n - dimensional image segmentation and alleviating the above problems in traditional DPMs segmentation. ### Main contributions: 1. **First use of univariate Gaussian latent space**: This is the first time to use the univariate Gaussian latent representation of the target object shape to condition the denoiser to accelerate the sampling process. 2. **Continuous latent space**: The continuous latent space allows the direct application of standard diffusion techniques, solving the problem of unnatural noise injection in multi - class object segmentation. 3. **Reduction of memory consumption and acceleration of training/sampling**: Diffusion in the latent space ensures that even for large 3D medical scans, memory consumption can be reduced and the training and sampling speeds can be accelerated. 4. **Robustness to noise**: Due to the use of low - dimensional image embedding, LDSeg is more robust to high - frequency noise in the source image, can better handle noisy image acquisition problems, and thus alleviates the domain shift problem. ### Method overview: The LDSeg framework contains two main components: 1. **Mask auto - encoder**: Used to learn the low - dimensional latent representation of the target object shape. 2. **Conditional denoiser**: Learns the noise distribution at each time step, conditional on the image embedding from the source image. ### Experimental results: LDSeg was experimented on three different medical image datasets, including Echo (2D + t echocardiogram video dataset), GlaS (2D histopathology dataset), and Knee (3D MRI dataset). The experimental results show that LDSeg outperforms existing methods in terms of segmentation accuracy, computational efficiency, and robustness to noise. ### Discussion: LDSeg is especially suitable for large 3D medical image datasets because these datasets cannot be down - sampled without losing important imaging features. In addition, the faster sampling in the reverse process of LDSeg makes it more computationally efficient. LDSeg also shows significant robustness to noise in the source image, which helps to solve the noisy image acquisition problem.