Denoising diffusion probabilistic models for generation of realistic fully-annotated microscopy image datasets
Dennis Eschweiler,Rüveyda Yilmaz,Matisse Baumann,Ina Laube,Rijo Roy,Abin Jose,Daniel Brückner,Johannes Stegmaier
DOI: https://doi.org/10.1371/journal.pcbi.1011890
2024-02-21
PLoS Computational Biology
Abstract:Recent advances in computer vision have led to significant progress in the generation of realistic image data, with denoising diffusion probabilistic models proving to be a particularly effective method. In this study, we demonstrate that diffusion models can effectively generate fully-annotated microscopy image data sets through an unsupervised and intuitive approach, using rough sketches of desired structures as the starting point. The proposed pipeline helps to reduce the reliance on manual annotations when training deep learning-based segmentation approaches and enables the segmentation of diverse datasets without the need for human annotations. We demonstrate that segmentation models trained with a small set of synthetic image data reach accuracy levels comparable to those of generalist models trained with a large and diverse collection of manually annotated image data, thereby offering a streamlined and specialized application of segmentation models. Modern generative techniques have unlocked the potential to create realistic image data of high quality, prompting the possibility of substituting real image data in segmentation training workflows. Our study highlights the capacity of denoising diffusion probabilistic models to generate high-quality microscopy image data. With adjustments to the generation process, these models can produce realistic fully-annotated image datasets through an intuitive and unsupervised approach. The parameters of the generative pipeline undergo optimization through various evaluations, resulting in synthetic image data that exhibits high PSNR scores. Our practical experiments encompass multiple scenarios, including manual annotations, initial segmentations, and simulations as starting points, demonstrating the versatility of our approach. Importantly, we compare the performance of segmentation models trained on a limited set of synthetic image data with those trained on a vast and diverse collection of manually annotated data, demonstrating the potential of our pipeline to alleviate the reliance on extensive manually annotated datasets. Our approach lays the groundwork for similar applications, thereby promoting the much-needed availability of publicly accessible fully-annotated image datasets and advancing the goal of annotation-free segmentation.
biochemical research methods,mathematical & computational biology