An Edit Friendly DDPM Noise Space: Inversion and Manipulations

Inbar Huberman-Spiegelglas,Vladimir Kulikov,Tomer Michaeli
2024-04-10
Abstract:Denoising diffusion probabilistic models (DDPMs) employ a sequence of white Gaussian noise samples to generate an image. In analogy with GANs, those noise maps could be considered as the latent code associated with the generated image. However, this native noise space does not possess a convenient structure, and is thus challenging to work with in editing tasks. Here, we propose an alternative latent noise space for DDPM that enables a wide range of editing operations via simple means, and present an inversion method for extracting these edit-friendly noise maps for any given image (real or synthetically generated). As opposed to the native DDPM noise space, the edit-friendly noise maps do not have a standard normal distribution and are not statistically independent across timesteps. However, they allow perfect reconstruction of any desired image, and simple transformations on them translate into meaningful manipulations of the output image (e.g. shifting, color edits). Moreover, in text-conditional models, fixing those noise maps while changing the text prompt, modifies semantics while retaining structure. We illustrate how this property enables text-based editing of real images via the diverse DDPM sampling scheme (in contrast to the popular non-diverse DDIM inversion). We also show how it can be used within existing diffusion-based editing methods to improve their quality and diversity. Webpage:
Machine Learning
What problem does this paper attempt to address?
This paper presents a solution to a problem in image editing based on Denoising Diffusion Probabilistic Models (DDPM). In DDPM, the original noise space is not conducive to editing tasks because the noise mappings are difficult to handle and edit. The paper proposes a new latent noise space that allows for diverse editing of real images without the need to fine-tune the model or modify attention maps, and can be easily integrated into other algorithms. Specifically, the main contributions of the paper include: 1. Introducing a method to extract a series of DDPM noise mappings from a given image (whether real or synthetic) that can perfectly reconstruct the image, and these noise mappings have editing-friendly characteristics. 2. These editing-friendly noise mappings do not follow a standard normal distribution and are not statistically independent between time steps, but they allow for meaningful transformations of the output image, such as translation and color editing. 3. By fixing these noise mappings in a text-conditioned model and changing the text prompts, it is possible to change the semantics while preserving the structure, which is particularly useful for text-guided editing using DDPM sampling schemes. 4. The paper also demonstrates how to integrate this method with existing diffusion-based editing methods to improve their quality and diversity. The paper showcases the superiority of the new approach in text-guided editing tasks through experiments, including text editing while preserving the input image structure, faster editing speed, and more diverse results. Additionally, compared to other DDIM-based inversion methods, the new approach performs better in terms of structural fidelity.