Abstract:We propose a simple but effective training-free approach tailored to diffusion-based image-to-image translation. Our approach revises the original noise prediction network of a pretrained diffusion model by introducing a noise correction term. We formulate the noise correction term as the difference between two noise predictions; one is computed from the denoising network with a progressive interpolation of the source and target prompt embeddings, while the other is the noise prediction with the source prompt embedding. The final noise prediction network is given by a linear combination of the standard denoising term and the noise correction term, where the former is designed to reconstruct must-be-preserved regions while the latter aims to effectively edit regions of interest relevant to the target prompt. Our approach can be easily incorporated into existing image-to-image translation methods based on diffusion models. Extensive experiments verify that the proposed technique achieves outstanding performance with low latency and consistently improves existing frameworks when combined with them.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are two key challenges in text - driven image - to - image translation tasks: 1. **Difficulty in finding the ideal starting point of the reverse diffusion process**: In the reverse diffusion process, it is very difficult to determine an appropriate initial noise state so that the generated image can both reflect the target prompt and keep the background or structure of the source image unchanged. 2. **Difficulty in editing specific regions**: When generating an image, how to only modify the specific regions related to the target prompt without distorting the rest of the image is a difficult problem. To solve these problems, the authors propose a simple and effective training - free method based on the diffusion model. This method improves the standard noise prediction network of the pre - trained diffusion model by introducing a noise correction term. Specifically, the noise correction term is calculated by gradually interpolating the prompt embeddings of the source prompt and the target prompt, thereby achieving selective editing of the region of interest while preserving the overall structure and background of the image. ### Method Overview The noise prediction network proposed by the authors consists of two parts: - **Standard denoising term**: Used to reconstruct the overall structure and background of the source image. - **Noise correction term**: Selectively modifies the regions related to the target prompt by gradually interpolating the prompt embeddings of the source prompt and the target prompt. The final noise prediction network can be represented as a linear combination of these two parts: \[ \hat{\epsilon}_\theta(x_{t}^{\text{tgt}}, t, y_{\text{tgt}}) = \epsilon_\theta(x_{t}^{\text{src}}, t, y_{\text{src}}) + \gamma \Delta \epsilon_\theta(x_{t}^{\text{tgt}}, t, y_t) \] where \(\Delta \epsilon_\theta(x_{t}^{\text{tgt}}, t, y_t)\) is the noise correction term, defined as: \[ \Delta \epsilon_\theta(x_{t}^{\text{tgt}}, t, y_t) = \epsilon_\theta(x_{t}^{\text{tgt}}, t, y_t) - \epsilon_\theta(x_{t}^{\text{tgt}}, t, y_{\text{src}}) \] ### Main Contributions 1. **Proposed a new noise prediction strategy**: By gradually updating the text prompt embeddings, a smooth transition from the source prompt to the target prompt is achieved. 2. **Defined the noise correction term**: Ensure that the generated image can both reflect the target prompt and maintain the structure and background of the source image. 3. **Experimental results show**: This method performs well on multiple tasks and can significantly improve performance when combined with existing methods. In conclusion, this paper aims to solve the key challenges in text - driven image - to - image translation tasks by introducing the noise correction term and the method of gradually interpolating prompt embeddings, thereby achieving high - quality image editing.

Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation

Learning to Translate Noise for Robust Image Denoising

NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolation

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

Conditional Score Guidance for Text-Driven Image-to-Image Translation

Latent Schrodinger Bridge: Prompting Latent Diffusion for Fast Unpaired Image-to-Image Translation

MirrorDiffusion: Stabilizing Diffusion Process in Zero-Shot Image Translation by Prompts Redescription and Beyond

A Noise is Worth Diffusion Guidance

UNIT-DDPM: UNpaired Image Translation with Denoising Diffusion Probabilistic Models

A Diffusion Model Translator for Efficient Image-to-Image Translation

Golden Noise for Diffusion Models: A Learning Framework

Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment

Observation-Guided Diffusion Probabilistic Models

A new diffusion method for blind image denoising

Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

Enhancing Sample Generation of Diffusion Models using Noise Level Correction

Pretraining is All You Need for Image-to-Image Translation

Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis

Palette: Image-to-Image Diffusion Models

Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance