ShiftDDPMs: Exploring Conditional Diffusion Models by Shifting Diffusion Trajectories

Zijian Zhang,Zhou Zhao,Jun Yu,Qi Tian
DOI: https://doi.org/10.48550/arXiv.2302.02373
2023-03-25
Abstract:Diffusion models have recently exhibited remarkable abilities to synthesize striking image samples since the introduction of denoising diffusion probabilistic models (DDPMs). Their key idea is to disrupt images into noise through a fixed forward process and learn its reverse process to generate samples from noise in a denoising way. For conditional DDPMs, most existing practices relate conditions only to the reverse process and fit it to the reversal of unconditional forward process. We find this will limit the condition modeling and generation in a small time window. In this paper, we propose a novel and flexible conditional diffusion model by introducing conditions into the forward process. We utilize extra latent space to allocate an exclusive diffusion trajectory for each condition based on some shifting rules, which will disperse condition modeling to all timesteps and improve the learning capacity of model. We formulate our method, which we call \textbf{ShiftDDPMs}, and provide a unified point of view on existing related methods. Extensive qualitative and quantitative experiments on image synthesis demonstrate the feasibility and effectiveness of ShiftDDPMs.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use conditional information more effectively to improve the quality of image generation in Conditional Diffusion Models (CDMs). Specifically, existing conditional diffusion models mainly introduce conditional information in the reverse process, which limits the time window for conditional modeling and generation and restricts the learning ability of the model. The paper proposes a new method - ShiftDDPMs. By introducing conditional information in the forward process, the diffusion trajectories under different conditions can remain separated throughout all time steps, thereby enhancing the learning ability and generation effect of the model. ### Main Contributions 1. **Systematically introduce the conditional forward process**: For the first time, the paper systematically explores how to design controllable condition - dependent diffusion trajectories in diffusion models and provides a unified perspective to understand existing related methods. 2. **Improve the utilization of latent space and the learning ability of the model**: By shifting the diffusion trajectories, ShiftDDPMs can make better use of the latent space and improve the learning ability of the model. 3. **Extensive experimental verification**: The paper verifies the feasibility and effectiveness of ShiftDDPMs in various image synthesis tasks through a large number of qualitative and quantitative experiments. ### Method Overview - **Conditional forward process**: The paper proposes a conditional forward process to shift the diffusion trajectory by introducing conditional information at each time step. The specific form is as follows: \[ q(x_t|x_0, c)=\mathcal{N}(\sqrt{\bar{\alpha}_t}x_0 + s_t,(1 - \bar{\alpha}_t)\Sigma) \] where \(s_t = k_t\cdot E(c)\) is the cumulative mean shift of the diffusion trajectory at step \(t\), \(k_t\) is a shift coefficient schedule that determines the shift pattern, and \(E(c)\) is a function that maps the condition to the latent space. - **Parameterized reverse process**: The paper also proposes a parameterized reverse process to fit the inverse process of the forward process. The specific form is as follows: \[ p_\theta(x_{t - 1}|x_t, c)=\mathcal{N}\left(\frac{1}{\sqrt{\alpha_t}}\left[x_t-\beta_t\sqrt{1 - \bar{\alpha}_t}g_\theta(x_t, t)\right]-\sqrt{\alpha_t}\frac{1 - \bar{\alpha}_{t - 1}}{1 - \bar{\alpha}_t}s_t + s_{t - 1},\frac{1 - \bar{\alpha}_{t - 1}}{1 - \bar{\alpha}_t}\beta_t\Sigma\right) \] ### Experimental Results - **Effectiveness of conditional sampling**: On the MNIST dataset, the paper verifies the effectiveness of different shift patterns (Prior - Shift, Data - Normalization, Quadratic - Shift), and all models can successfully generate conditional images. - **Sample quality assessment**: On the CIFAR - 10 dataset, the paper evaluates the performance of different models through metrics such as Inception Score, FID, and negative log - likelihood. The results show that ShiftDDPMs outperforms traditional conditional diffusion models on multiple metrics. - **Fast sampling adaptability**: The paper also adapts ShiftDDPMs to the DDIMs framework to achieve fast sampling while maintaining high - quality generation. - **Diffusion trajectory interpolation**: By interpolating diffusion trajectories under different conditions to generate images with mixed features, the paper verifies the decoupling characteristics of diffusion trajectories. - **Image inpainting and text - to - image generation**: The paper also conducts experiments on image inpainting and text - to - image generation tasks to further verify the effectiveness of ShiftDDPMs. ### Conclusion The paper proposes...