Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Jiayi Guo,Xingqian Xu,Yifan Pu,Zanlin Ni,Chaofei Wang,Manushree Vasu,Shiji Song,Gao Huang,Humphrey Shi
2023-12-08
Abstract:Recently, diffusion models have made remarkable progress in text-to-image (T2I) generation, synthesizing images with high fidelity and diverse contents. Despite this advancement, latent space smoothness within diffusion models remains largely unexplored. Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image. This property proves beneficial in downstream tasks, including image interpolation, inversion, and editing. In this work, we expose the non-smoothness of diffusion latent spaces by observing noticeable visual fluctuations resulting from minor latent variations. To tackle this issue, we propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth. Specifically, we introduce Step-wise Variation Regularization to enforce the proportion between the variations of an arbitrary input latent and that of the output image is a constant at any diffusion training step. In addition, we devise an interpolation standard deviation (ISTD) metric to effectively assess the latent space smoothness of a diffusion model. Extensive quantitative and qualitative experiments demonstrate that Smooth Diffusion stands out as a more desirable solution not only in T2I generation but also across various downstream tasks. Smooth Diffusion is implemented as a plug-and-play Smooth-LoRA to work with various community models. Code is available at <a class="link-external link-https" href="https://github.com/SHI-Labs/Smooth-Diffusion" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issue of poor performance in downstream tasks such as image interpolation, inverse reconstruction, and editing due to the non-smoothness of the latent space in diffusion models. Specifically: 1. **Image Interpolation**: In the task of image interpolation, existing diffusion models (such as Stable Diffusion) exhibit significant visual fluctuations in the generated images when making small changes to the latent variables, resulting in poor continuity and stability during the interpolation process. 2. **Image Inverse Reconstruction**: In the task of image inverse reconstruction, existing diffusion models fail to accurately reconstruct high-quality images from the source image, often producing incorrect colors, object orientations, and even misidentifying certain objects as others. 3. **Image Editing**: In the task of image editing, even minor changes in text prompts can lead to significant changes in image content and layout, making the editing results difficult to control. Additionally, current diffusion models perform poorly in drag-based editing tasks, easily distorting the shape and semantics of objects. To address these issues, the paper proposes a new diffusion model—**Smooth Diffusion**, which enhances the smoothness of the model's latent space by introducing **Step-wise Variation Regularization**. This approach not only improves the quality of image generation but also performs well in various downstream tasks.