Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation

Yihang Zhou,Rebecca Towning,Zaid Awad,Stamatia Giannarou

2024-10-31

Abstract:Surgical scene segmentation is essential for enhancing surgical precision, yet it is frequently compromised by the scarcity and imbalance of available data. To address these challenges, semantic image synthesis methods based on generative adversarial networks and diffusion models have been developed. However, these models often yield non-diverse images and fail to capture small, critical tissue classes, limiting their effectiveness. In response, we propose the Class-Aware Semantic Diffusion Model (CASDM), a novel approach which utilizes segmentation maps as conditions for image synthesis to tackle data scarcity and imbalance. Novel class-aware mean squared error and class-aware self-perceptual loss functions have been defined to prioritize critical, less visible classes, thereby enhancing image quality and relevance. Furthermore, to our knowledge, we are the first to generate multi-class segmentation maps using text prompts in a novel fashion to specify their contents. These maps are then used by CASDM to generate surgical scene images, enhancing datasets for training and validating segmentation models. Our evaluation, which assesses both image quality and downstream segmentation performance, demonstrates the strong effectiveness and generalisability of CASDM in producing realistic image-map pairs, significantly advancing surgical scene segmentation across diverse and challenging datasets.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

This paper attempts to solve the problems of data scarcity and class imbalance in surgical scene segmentation. Specifically: 1. **Data Scarcity**: Surgical scene segmentation requires a large amount of annotated data to train the model. However, obtaining this data is both time - consuming and labor - intensive, and it is often difficult for professional surgeons to accurately annotate low - contrast areas and unclear edges. 2. **Class Imbalance**: Although existing multi - class segmentation methods perform well in segmenting large and obvious anatomical structures or surgical tools, they often have difficulty accurately identifying certain classes when these classes are significantly smaller or less frequent in the dataset. This imbalance will lead to poor generalization ability of the model for these rare classes during testing, affecting the application effect in surgery, especially in cases where precise identification of subtle abnormalities is required. To address these problems, the paper proposes a new method - **Class - Aware Semantic Diffusion Model (CASDM)**, which uses segmentation maps as conditions for image synthesis to solve the problems of data scarcity and imbalance. In addition, new loss functions (such as class - aware mean - squared - error loss and class - aware self - perception loss) are introduced to improve image quality and relevance, and text prompts are used for the first time to generate multi - class segmentation maps, thereby guiding image synthesis and enhancing the diversity of the dataset. Experimental results show that CASDM has significant advantages in generating high - quality images and improving the performance of downstream segmentation tasks, especially when dealing with scarce and challenging classes.

Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation

Data Augmentation for Surgical Scene Segmentation with Anatomy-Aware Diffusion Models

Lactation and reproduction.

Boosting Dermatoscopic Lesion Segmentation via Diffusion Models with Visual and Textual Prompts

SSIS-Seg: Simulation-Supervised Image Synthesis for Surgical Instrument Segmentation

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

Cascaded Diffusion Models for 2D and 3D Microscopy Image Synthesis to Enhance Cell Segmentation

SurgicaL-CD: Generating Surgical Images via Unpaired Image Translation with Latent Consistency Diffusion Models

Data Augmentation in Class-Conditional Diffusion Model for Semi-Supervised Medical Image Segmentation

Anatomically-Controllable Medical Image Generation with Segmentation-Guided Diffusion Models

Less is More: Unsupervised Mask-guided Annotated CT Image Synthesis with Minimum Manual Segmentations

Ambiguous Medical Image Segmentation using Diffusion Models

CSG: A Context-Semantic Guided Diffusion Approach in De Novo Musculoskeletal Ultrasound Image Generation

Conditional Diffusion Models for Weakly Supervised Medical Image Segmentation

SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation

Label-informed cardiac magnetic resonance image synthesis through conditional generative adversarial networks

Semantic Image Synthesis for Abdominal CT

SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion Models

Shape-Consistent Generative Adversarial Networks for Multi-Modal Medical Segmentation Maps

Semantic Image Synthesis via Class-Adaptive Cross-Attention

SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis