Data Augmentation for Surgical Scene Segmentation with Anatomy-Aware Diffusion Models

Danush Kumar Venkatesh,Dominik Rivoir,Micha Pfeiffer,Fiona Kolbinger,Stefanie Speidel
2024-11-21
Abstract:In computer-assisted surgery, automatically recognizing anatomical organs is crucial for understanding the surgical scene and providing intraoperative assistance. While machine learning models can identify such structures, their deployment is hindered by the need for labeled, diverse surgical datasets with anatomical annotations. Labeling multiple classes (i.e., organs) in a surgical scene is time-intensive, requiring medical experts. Although synthetically generated images can enhance segmentation performance, maintaining both organ structure and texture during generation is challenging. We introduce a multi-stage approach using diffusion models to generate multi-class surgical datasets with annotations. Our framework improves anatomy awareness by training organ specific models with an inpainting objective guided by binary segmentation masks. The organs are generated with an inference pipeline using pre-trained ControlNet to maintain the organ structure. The synthetic multi-class datasets are constructed through an image composition step, ensuring structural and textural consistency. This versatile approach allows the generation of multi-class datasets from real binary datasets and simulated surgical masks. We thoroughly evaluate the generated datasets on image quality and downstream segmentation, achieving a $15\%$ improvement in segmentation scores when combined with real images. The code is available at <a class="link-external link-https" href="https://gitlab.com/nct_tso_public/muli-class-image-synthesis" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of automatically identifying anatomical organs in computer - assisted surgery. Specifically, although machine - learning models are able to recognize these structures, their application is limited by the need for large - scale, diverse surgical datasets with anatomical annotations. Annotating multiple categories (i.e., organs) is very time - consuming in the surgical scenario and requires the participation of medical experts. Moreover, although synthetically generated images can improve the segmentation performance, maintaining the consistency of organ structures and textures during the generation process remains a challenge. Therefore, this paper proposes a multi - stage method, using a diffusion model to generate multi - class surgical datasets with annotations to improve anatomical awareness and maintaining the organ structures through pre - trained ControlNet, thereby solving the above - mentioned problems.