Synthesizing High-Quality Construction Segmentation Datasets Through Pre-trained Diffusion Model

Jiahao Huo,Zhengyao Wang,Rui Zhao,Lijun Sun,Fei Shen
DOI: https://doi.org/10.1007/978-981-97-5609-4_27
2024-01-01
Abstract:Deep learning-based semantic segmentation methods have demonstrated their effectiveness in various engineering fields, including construction management. However, the lack of large-scale, open-source datasets specific to the construction industry hinders the development of related techniques. Currently available datasets in this domain heavily rely on manual collection and annotation, which is a time-consuming and labor-intensive process. In this paper, we propose a novel approach to address this challenge by leveraging pre-trained diffusion models to generate highly aligned image-mask pairs. We achieve this by automatically producing corresponding segmentation masks through the attention maps generated during the denoising process. Drawing inspiration from Otsu's method, we introduce a novel annotating algorithm that converts the cross-attention maps of the diffusion model into pixel-wise masks for the generated images using an automatically selected threshold. Experimental results demonstrate the effectiveness of our method as a tool for data augmentation in few-shot semantic segmentation tasks. Our approach significantly enhances the performance of the segmentation backbone while utilizing less than 1% of real datasets. This highlights the potential of our method to overcome the limitations of limited training data in the construction industry and improve the accuracy and efficiency of semantic segmentation models.
What problem does this paper attempt to address?