Generative Edge Detection with Stable Diffusion

Caixia Zhou,Yaping Huang,Mochu Xiang,Jiahui Ren,Haibin Ling,Jing Zhang
2024-10-04
Abstract:Edge detection is typically viewed as a pixel-level classification problem mainly addressed by discriminative methods. Recently, generative edge detection methods, especially diffusion model based solutions, are initialized in the edge detection task. Despite great potential, the retraining of task-specific designed modules and multi-step denoising inference limits their broader applications. Upon closer investigation, we speculate that part of the reason is the under-exploration of the rich discriminative information encoded in extensively pre-trained large models (\eg, stable diffusion models). Thus motivated, we propose a novel approach, named Generative Edge Detector (GED), by fully utilizing the potential of the pre-trained stable diffusion model. Our model can be trained and inferred efficiently without specific network design due to the rich high-level and low-level prior knowledge empowered by the pre-trained stable diffusion. Specifically, we propose to finetune the denoising U-Net and predict latent edge maps directly, by taking the latent image feature maps as input. Additionally, due to the subjectivity and ambiguity of the edges, we also incorporate the granularity of the edges into the denoising U-Net model as one of the conditions to achieve controllable and diverse predictions. Furthermore, we devise a granularity regularization to ensure the relative granularity relationship of the multiple predictions. We conduct extensive experiments on multiple datasets and achieve competitive performance (\eg, 0.870 and 0.880 in terms of ODS and OIS on the BSDS test dataset).
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to utilize the advantages of pre - trained diffusion models (such as the Stable Diffusion model) in edge detection tasks, while avoiding the need for multi - step denoising processes and specific network designs. Specifically, although existing generative edge detection methods have potential, the retraining of specific modules and multi - step denoising inferences limit their broader applications. In addition, these methods fail to fully utilize the rich discriminative information encoded in large - scale pre - trained models (such as the Stable Diffusion model). Therefore, this paper proposes a new generative edge detector (Generative Edge Detector, GED) to solve these problems by fully exploiting the potential of the pre - trained Stable Diffusion model. ### Main contributions: 1. **Propose a new generative edge detector (GED)**: By exploring the rich prior knowledge obtained from the pre - trained diffusion model, the training steps are greatly reduced. 2. **Predict the latent edge map instead of noise**: This avoids the need for multi - step denoising processes and specific network designs. 3. **Introduce granularity information**: By integrating granularity information into the denoising U - Net to obtain diverse and controllable edge predictions, and design explicit ordinal regularization to reasonably constrain the predicted granularity. 4. **Experimental verification**: Experiments on multiple edge detection datasets demonstrate the effectiveness of the proposed method. ### Method overview: - **Input encoding**: Use the VAE encoder in the pre - trained Stable Diffusion model to convert the input image into latent space features. - **Granularity fusion**: Convert the granularity information into a vector with the same dimension as the time feature through a fully - connected layer, and add it pixel - by - pixel to the time feature. - **Edge map prediction**: Only fine - tune the denoising U - Net to predict the corresponding latent edge map, and then generate the final edge prediction through the decoder. - **Loss function**: Design the latent edge map alignment loss and the granularity regularization loss to ensure that the predicted edge map is aligned with the ground truth and maintain the predicted granularity relationship. ### Experimental results: - **BSDS dataset**: On the Optimal Dataset Scale (ODS), Optimal Image Scale (OIS) and Average Precision (AP) metrics, GED reaches 0.870, 0.880 and 0.907 respectively, significantly outperforming existing methods. - **Multicue dataset**: GED achieves the best performance in both edge detection and boundary detection tasks. - **NYUD dataset**: In the indoor scene parsing task, GED significantly improves the ODS and OIS metrics. - **BIPED dataset**: In the outdoor image edge detection task, GED achieves the best OIS and a higher AP. ### Conclusion: This paper successfully utilizes the potential of the pre - trained Stable Diffusion model by proposing a new generative edge detector (GED), avoiding the need for multi - step denoising processes and specific network designs, and thus achieving significant performance improvements on multiple edge detection datasets.