Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation

Yihang Zhou,Rebecca Towning,Zaid Awad,Stamatia Giannarou
2024-10-31
Abstract:Surgical scene segmentation is essential for enhancing surgical precision, yet it is frequently compromised by the scarcity and imbalance of available data. To address these challenges, semantic image synthesis methods based on generative adversarial networks and diffusion models have been developed. However, these models often yield non-diverse images and fail to capture small, critical tissue classes, limiting their effectiveness. In response, we propose the Class-Aware Semantic Diffusion Model (CASDM), a novel approach which utilizes segmentation maps as conditions for image synthesis to tackle data scarcity and imbalance. Novel class-aware mean squared error and class-aware self-perceptual loss functions have been defined to prioritize critical, less visible classes, thereby enhancing image quality and relevance. Furthermore, to our knowledge, we are the first to generate multi-class segmentation maps using text prompts in a novel fashion to specify their contents. These maps are then used by CASDM to generate surgical scene images, enhancing datasets for training and validating segmentation models. Our evaluation, which assesses both image quality and downstream segmentation performance, demonstrates the strong effectiveness and generalisability of CASDM in producing realistic image-map pairs, significantly advancing surgical scene segmentation across diverse and challenging datasets.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve the problems of data scarcity and class imbalance in surgical scene segmentation. Specifically: 1. **Data Scarcity**: Surgical scene segmentation requires a large amount of annotated data to train the model. However, obtaining this data is both time - consuming and labor - intensive, and it is often difficult for professional surgeons to accurately annotate low - contrast areas and unclear edges. 2. **Class Imbalance**: Although existing multi - class segmentation methods perform well in segmenting large and obvious anatomical structures or surgical tools, they often have difficulty accurately identifying certain classes when these classes are significantly smaller or less frequent in the dataset. This imbalance will lead to poor generalization ability of the model for these rare classes during testing, affecting the application effect in surgery, especially in cases where precise identification of subtle abnormalities is required. To address these problems, the paper proposes a new method - **Class - Aware Semantic Diffusion Model (CASDM)**, which uses segmentation maps as conditions for image synthesis to solve the problems of data scarcity and imbalance. In addition, new loss functions (such as class - aware mean - squared - error loss and class - aware self - perception loss) are introduced to improve image quality and relevance, and text prompts are used for the first time to generate multi - class segmentation maps, thereby guiding image synthesis and enhancing the diversity of the dataset. Experimental results show that CASDM has significant advantages in generating high - quality images and improving the performance of downstream segmentation tasks, especially when dealing with scarce and challenging classes.