Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

Haofeng Liu,Chenshu Xu,Yifei Yang,Lihua Zeng,Shengfeng He
2024-04-01
Abstract:Point-based interactive editing serves as an essential tool to complement the controllability of existing generative models. A concurrent work, DragDiffusion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present DragNoise, offering robust and accelerated editing without retracing the latent map. The core rationale of DragNoise lies in utilizing the predicted noise output of each U-Net as a semantic editor. This approach is grounded in two critical observations: firstly, the bottleneck features of U-Net inherently possess semantically rich features ideal for interactive editing; secondly, high-level semantics, established early in the denoising process, show minimal variation in subsequent stages. Leveraging these insights, DragNoise edits diffusion semantics in a single denoising step and efficiently propagates these changes, ensuring stability and efficiency in diffusion editing. Comparative experiments reveal that DragNoise achieves superior control and semantic retention, reducing the optimization time by over 50% compared to DragDiffusion. Our codes are available at
Computer Vision and Pattern Recognition,Graphics,Human-Computer Interaction,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address several key issues in point-based interactive image editing: 1. **Problems with Existing Methods**: - **DragGAN** and other methods based on Generative Adversarial Networks (GANs) struggle to maintain global content consistency during the editing process and are prone to inaccurate changes in local features. - **DragDiffusion**, although utilizing diffusion models, suffers from gradient vanishing issues, leading to unstable editing effects and difficulty in precisely retaining the original content during backpropagation. 2. **Core Contributions**: - The **DragNoise** method is proposed, which performs semantic editing by leveraging the bottleneck features of U-Net, avoiding the process of re-tracking the latent map, thereby achieving more stable and efficient point-based interactive editing. - Utilizing the diffusion model to predict noise output as a semantic editor, this method effectively captures and propagates high-level semantic information, ensuring stability and efficiency during the editing process. - Experimental results show that DragNoise outperforms DragDiffusion in terms of control capability and semantic retention, with optimization time reduced by more than 50%. In summary, this paper aims to provide a more efficient and controllable point-based interactive image editing framework by improving existing methods.