Diffusion Models already have a Semantic Latent Space

Mingi Kwon,Jaeseok Jeong,Youngjung Uh
DOI: https://doi.org/10.48550/arXiv.2210.10960
2023-03-29
Abstract:Diffusion models achieve outstanding generative performance in various domains. Despite their great success, they lack semantic latent space which is essential for controlling the generative process. To address the problem, we propose asymmetric reverse process (Asyrp) which discovers the semantic latent space in frozen pretrained diffusion models. Our semantic latent space, named h-space, has nice properties for accommodating semantic image manipulation: homogeneity, linearity, robustness, and consistency across timesteps. In addition, we introduce a principled design of the generative process for versatile editing and quality boost ing by quantifiable measures: editing strength of an interval and quality deficiency at a timestep. Our method is applicable to various architectures (DDPM++, iD- DPM, and ADM) and datasets (CelebA-HQ, AFHQ-dog, LSUN-church, LSUN- bedroom, and METFACES). Project page: <a class="link-external link-https" href="https://kwonminki.github.io/Asyrp/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: although diffusion models perform excellently in terms of generation performance, they lack a semantic latent space, which makes it difficult to control the generation process. Specifically, existing methods such as image - guided, classifier - guided, and fine - tuning the entire model all have their own limitations, including non - intuitive control, quality degradation, or the need for additional training, etc. To solve these problems, the author proposes a new Asymmetric Reverse Process (Asyrp), which can discover the semantic latent space in a frozen pre - trained diffusion model, thereby achieving effective control of the generation process. The author names this semantic latent space **h - space** and proves that it has the following excellent properties: - **Homogeneity**: The same displacement will cause the same effect on all images in this space. - **Linearity**: Linear changes in this space will lead to linear changes in attributes. - **Robustness**: Changes will not degrade the quality of the generated images. - **Consistency across timesteps**: Changes remain almost consistent throughout the timesteps. In addition, the author also introduces an optimized design of the generation process to enhance the diversity and quality of editing through quantifiable editing intensity and quality defects. This method is applicable to multiple architectures (such as DDPM++, iD - DPM, and ADM) and datasets (such as CelebA - HQ, AFHQ - dog, LSUN - church, LSUN - bedroom, and METFACES). In summary, this paper aims to overcome the shortcomings of existing diffusion models in semantic control by introducing Asyrp and h - space, thereby achieving more powerful and flexible image - editing functions.