Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Jiayi Guo,Xingqian Xu,Yifan Pu,Zanlin Ni,Chaofei Wang,Manushree Vasu,Shiji Song,Gao Huang,Humphrey Shi

2023-12-08

Abstract:Recently, diffusion models have made remarkable progress in text-to-image (T2I) generation, synthesizing images with high fidelity and diverse contents. Despite this advancement, latent space smoothness within diffusion models remains largely unexplored. Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image. This property proves beneficial in downstream tasks, including image interpolation, inversion, and editing. In this work, we expose the non-smoothness of diffusion latent spaces by observing noticeable visual fluctuations resulting from minor latent variations. To tackle this issue, we propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth. Specifically, we introduce Step-wise Variation Regularization to enforce the proportion between the variations of an arbitrary input latent and that of the output image is a constant at any diffusion training step. In addition, we devise an interpolation standard deviation (ISTD) metric to effectively assess the latent space smoothness of a diffusion model. Extensive quantitative and qualitative experiments demonstrate that Smooth Diffusion stands out as a more desirable solution not only in T2I generation but also across various downstream tasks. Smooth Diffusion is implemented as a plug-and-play Smooth-LoRA to work with various community models. Code is available at <a class="link-external link-https" href="https://github.com/SHI-Labs/Smooth-Diffusion" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address the issue of poor performance in downstream tasks such as image interpolation, inverse reconstruction, and editing due to the non-smoothness of the latent space in diffusion models. Specifically: 1. **Image Interpolation**: In the task of image interpolation, existing diffusion models (such as Stable Diffusion) exhibit significant visual fluctuations in the generated images when making small changes to the latent variables, resulting in poor continuity and stability during the interpolation process. 2. **Image Inverse Reconstruction**: In the task of image inverse reconstruction, existing diffusion models fail to accurately reconstruct high-quality images from the source image, often producing incorrect colors, object orientations, and even misidentifying certain objects as others. 3. **Image Editing**: In the task of image editing, even minor changes in text prompts can lead to significant changes in image content and layout, making the editing results difficult to control. Additionally, current diffusion models perform poorly in drag-based editing tasks, easily distorting the shape and semantics of objects. To address these issues, the paper proposes a new diffusion model—**Smooth Diffusion**, which enhances the smoothness of the model's latent space by introducing **Step-wise Variation Regularization**. This approach not only improves the quality of image generation but also performs well in various downstream tasks.

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance

InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models

Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Plug-and-Play Diffusion Distillation

SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules

DiffMat: Latent diffusion models for image-guided material generation

Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry

Not All Steps Are Created Equal: Selective Diffusion Distillation for Image Manipulation

Diffusion Models already have a Semantic Latent Space

UniFL: Improve Latent Diffusion Model via Unified Feedback Learning

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Text-driven Visual Synthesis with Latent Diffusion Prior

DiffSmooth: Certifiably Robust Learning via Diffusion Models and Local Smoothing

Saliency Guided Optimization of Diffusion Latents

FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion

AdaDiff: Adaptive Step Selection for Fast Diffusion.

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models