Diffusion Models already have a Semantic Latent Space

Mingi Kwon,Jaeseok Jeong,Youngjung Uh

DOI: https://doi.org/10.48550/arXiv.2210.10960

2023-03-29

Abstract:Diffusion models achieve outstanding generative performance in various domains. Despite their great success, they lack semantic latent space which is essential for controlling the generative process. To address the problem, we propose asymmetric reverse process (Asyrp) which discovers the semantic latent space in frozen pretrained diffusion models. Our semantic latent space, named h-space, has nice properties for accommodating semantic image manipulation: homogeneity, linearity, robustness, and consistency across timesteps. In addition, we introduce a principled design of the generative process for versatile editing and quality boost ing by quantifiable measures: editing strength of an interval and quality deficiency at a timestep. Our method is applicable to various architectures (DDPM++, iD- DPM, and ADM) and datasets (CelebA-HQ, AFHQ-dog, LSUN-church, LSUN- bedroom, and METFACES). Project page: <a class="link-external link-https" href="https://kwonminki.github.io/Asyrp/" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: although diffusion models perform excellently in terms of generation performance, they lack a semantic latent space, which makes it difficult to control the generation process. Specifically, existing methods such as image - guided, classifier - guided, and fine - tuning the entire model all have their own limitations, including non - intuitive control, quality degradation, or the need for additional training, etc. To solve these problems, the author proposes a new Asymmetric Reverse Process (Asyrp), which can discover the semantic latent space in a frozen pre - trained diffusion model, thereby achieving effective control of the generation process. The author names this semantic latent space **h - space** and proves that it has the following excellent properties: - **Homogeneity**: The same displacement will cause the same effect on all images in this space. - **Linearity**: Linear changes in this space will lead to linear changes in attributes. - **Robustness**: Changes will not degrade the quality of the generated images. - **Consistency across timesteps**: Changes remain almost consistent throughout the timesteps. In addition, the author also introduces an optimized design of the generation process to enhance the diversity and quality of editing through quantifiable editing intensity and quality defects. This method is applicable to multiple architectures (such as DDPM++, iD - DPM, and ADM) and datasets (such as CelebA - HQ, AFHQ - dog, LSUN - church, LSUN - bedroom, and METFACES). In summary, this paper aims to overcome the shortcomings of existing diffusion models in semantic control by introducing Asyrp and h - space, thereby achieving more powerful and flexible image - editing functions.

Diffusion Models already have a Semantic Latent Space

Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?

Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts

Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry

Semantic Image Synthesis Via Diffusion Models

Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis

A Survey of Data-Driven 2D Diffusion Models for Generating Images from Text

Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance

Diffusion Features to Bridge Domain Gap for Semantic Segmentation

Diffusion Models Without Attention

Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing

Diffusion Models Need Visual Priors for Image Generation

Isometric Representation Learning for Disentangled Latent Space of Diffusion Models

Scalable Diffusion Models with State Space Backbone

Unsupervised Region-Based Image Editing of Denoising Diffusion Models

Nested Diffusion Models Using Hierarchical Latent Priors

Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models