Abstract:How do diffusion generative models convert pure noise into meaningful images? In a variety of pretrained diffusion models (including conditional latent space models like Stable Diffusion), we observe that the reverse diffusion process that underlies image generation has the following properties: (i) individual trajectories tend to be low-dimensional and resemble 2D `rotations'; (ii) high-variance scene features like layout tend to emerge earlier, while low-variance details tend to emerge later; and (iii) early perturbations tend to have a greater impact on image content than later perturbations. To understand these phenomena, we derive and study a closed-form solution to the probability flow ODE for a Gaussian distribution, which shows that the reverse diffusion state rotates towards a gradually-specified target on the image manifold. It also shows that generation involves first committing to an outline, and then to finer and finer details. We find that this solution accurately describes the initial phase of image generation for pretrained models, and can in principle be used to make image generation more efficient by skipping reverse diffusion steps. Finally, we use our solution to characterize the image manifold in Stable Diffusion. Our viewpoint reveals an unexpected similarity between generation by GANs and diffusion and provides a conceptual link between diffusion and image retrieval.

What problem does this paper attempt to address?

The paper primarily explores the behavioral characteristics of diffusion generative models in the image generation process and proposes a theoretical framework to explain these characteristics. Specifically, the paper focuses on the following core issues: 1. **Generation Order**: The study finds that in a series of pre-trained diffusion models, there is a phenomenon of "contour first, details later" during the reverse diffusion process. That is, when generating images, the general layout of the scene appears first, followed by the filling in of details. 2. **Trajectory Characteristics**: A single generation trajectory is often low-dimensional and similar to a 2D rotation. This means that during the transition from pure noise to a meaningful image, the path of change in the image state can be approximated as a rotational motion on a plane. 3. **Perturbation Impact**: Early perturbations have a greater impact on the image content than later perturbations. This implies that small changes in the initial stages of image generation may have a larger impact on the final result. To understand these phenomena, the authors derive a closed-form solution for the probability flow ordinary differential equation (ODE) of Gaussian distributions and use this solution to show how the reverse diffusion state gradually approaches the target image. Additionally, this solution can describe the initial stages of image generation by pre-trained models and can be used to improve image generation efficiency, for example, by skipping certain reverse diffusion steps. The paper also proposes an analytical theory-based method to accelerate the sampling process of unconditional diffusion models, i.e., using the Gaussian analytical solution for "teleportation," thereby reducing the number of steps required for neural network function evaluation. This method has been proven effective in experiments, significantly improving generation speed while maintaining the quality of the generated images. Finally, through the analysis of sampling trajectories, the paper provides a method to characterize the image manifold, which helps to better understand the internal working mechanisms of diffusion models and the spatial structure of generated images.

Diffusion Models Generate Images Like Painters: an Analytical Theory of Outline First, Details Later

Explaining generative diffusion models via visual analysis for interpretable decision-making process

Understanding and contextualising diffusion models

Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

Nested Diffusion Processes for Anytime Image Generation

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Diffusion idea exploration for art generation

Image Neural Field Diffusion Models

Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?

Renormalization Group flow, Optimal Transport and Diffusion-based Generative Model

Gradient Domain Diffusion Models for Image Synthesis

Guided Image Synthesis via Initial Image Editing in Diffusion Model

Varying Manifolds in Diffusion: From Time-varying Geometries to Visual Saliency

EvolvED: Evolutionary Embeddings to Understand the Generation Process of Diffusion Models

Blackout Diffusion: Generative Diffusion Models in Discrete-State Spaces

Differential Diffusion: Giving Each Pixel Its Strength

Efficient image generation with Contour Wavelet Diffusion

Diffusion Models Need Visual Priors for Image Generation

Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance

A Survey of Diffusion Based Image Generation Models: Issues and Their Solutions

A Survey on Generative Diffusion Models