Cascaded Diffusion Models for 2D and 3D Microscopy Image Synthesis to Enhance Cell Segmentation

Rüveyda Yilmaz,Kaan Keven,Yuli Wu,Johannes Stegmaier
2024-11-19
Abstract:Automated cell segmentation in microscopy images is essential for biomedical research, yet conventional methods are labor-intensive and prone to error. While deep learning-based approaches have proven effective, they often require large annotated datasets, which are scarce due to the challenges of manual annotation. To overcome this, we propose a novel framework for synthesizing densely annotated 2D and 3D cell microscopy images using cascaded diffusion models. Our method synthesizes 2D and 3D cell masks from sparse 2D annotations using multi-level diffusion models and NeuS, a 3D surface reconstruction approach. Following that, a pretrained 2D Stable Diffusion model is finetuned to generate realistic cell textures and the final outputs are combined to form cell populations. We show that training a segmentation model with a combination of our synthetic data and real data improves cell segmentation performance by up to 9\% across multiple datasets. Additionally, the FID scores indicate that the synthetic data closely resembles real data. The code for our proposed approach will be available at <a class="link-external link-https" href="https://github.com/ruveydayilmaz0/cascaded_diffusion" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Automated cell segmentation is crucial for biomedical research in microscope images, but traditional methods are time - consuming and error - prone. Although deep - learning - based methods have proven effective, they usually require a large amount of labeled data for training, and such data is scarce due to the challenges of manual labeling.** To solve this problem, the authors propose a new framework that uses cascaded diffusion models to synthesize densely - labeled 2D and 3D cell microscopic images. Specifically, this method is achieved through the following steps: 1. **Generation of 2D and 3D cell masks**: - Use MaskDDPM (an architecture based on the denoising diffusion probability model) to generate 2D cell masks from sparse 2D annotations. - Utilize SyncDreamer to predict 2D images with consistent multiple views and synthesize these multi - view images into 3D volume masks through NeuS (a 3D surface reconstruction method). 2. **Cell texture generation**: - Use a pre - trained 2D Stable Diffusion model to generate realistic cell textures and superimpose them on the synthesized masks. - To ensure the texture continuity between slices in 3D images, the authors design a method to generate 3D images with similar structures and appropriate variations by using the characteristics of Stable Diffusion. 3. **Cell population synthesis**: - Combine the outputs of individual cells to create images containing multiple cells to simulate the cell clustering phenomenon in actual data. Through the above methods, the authors show that the segmentation model trained with the combination of synthetic data and real data can significantly improve cell segmentation performance, with an improvement of up to 9%. In addition, the FID score indicates that the synthetic data is very close to the real data, thus verifying the authenticity and effectiveness of the synthetic data. ### Formula presentation - **Multi - view consistency loss function of SyncDreamer**: \[ L(\theta)=\mathbb{E}_{t, x^{(1:N)}_0, n, \epsilon^{(1:N)}}\left[\left\|\epsilon^{(n)}-\epsilon^{(n)}_\theta(x^{(1:N)}_t, t)\right\|^2\right] \] where \(x^{(1:N)}_t\) is the noisy image at the diffusion time step \(t\), \(N\) is the number of predicted 2D views, and \(\epsilon^{(n)}\) and \(\epsilon^{(n)}_\theta\) are the added and predicted noises respectively. - **Combination of noise vectors in 3D image generation**: \[ c_{s,T}=\rho\sqrt{\frac{1}{1 + \rho^2}}\cdot c_{\text{cmn}}+\sqrt{\frac{1}{1 + \rho^2}}\cdot c_{s,\text{unq}} \] where \(s\) is the slice number, \(\rho\) is the strength of texture consistency between slices, \(c_{\text{cmn}}\) is the shared noise vector, and \(c_{s,\text{unq}}\) is the noise vector independently generated for each slice. Through these methods, this paper provides an effective solution to reduce the need for a large amount of labeled data and improve the performance of the cell segmentation task.