Abstract:Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis, leading to exponential growth in the literature. However, the complexity of diffusion-based modeling, the wide range of image synthesis tasks, and the diversity of conditioning mechanisms present significant challenges for researchers to keep up with rapid developments and understand the core concepts on this topic. In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. We specifically highlight the underlying principles, advantages, and potential challenges of various conditioning approaches in the training, re-purposing, and specialization stages to construct a desired denoising network. We also summarize six mainstream conditioning mechanisms in the essential sampling process. All discussions are centered around popular applications. Finally, we pinpoint some critical yet still open problems to be solved in the future and suggest some possible solutions. Our reviewed works are itemized at <a class="link-external link-https" href="https://github.com/zju-pi/Awesome-Conditional-Diffusion-Models" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper attempts to address the problem of how to effectively integrate user-specified conditions in the field of Conditional Image Synthesis to generate high-quality images that meet diverse needs. Specifically, the paper focuses on conditional image synthesis methods based on Diffusion Models (DMs). ### Main Issues 1. **Complexity of Condition Integration**: Existing diffusion models exhibit complexity in condition integration. Different conditioning mechanisms and task types make it difficult for researchers to keep up with the rapid development and understand core concepts. 2. **Diversity of Model Architectures and Training Methods**: There are various variants of diffusion model architectures, training methods, and sampling techniques, making it challenging for researchers to fully grasp the overall landscape of this field. 3. **Breadth of Conditional Synthesis Tasks**: There are numerous types of conditional image synthesis tasks, including text-to-image generation, image restoration, image editing, etc., each with its specific condition integration requirements. ### Solutions The paper addresses these issues through the following approaches: 1. **Classifying Existing Work**: Classifying existing work based on how conditions are integrated into the two fundamental components of diffusion models (denoising network and sampling process). 2. **Detailed Discussion of Condition Integration Mechanisms**: Highlighting the advantages, challenges, and potential issues of various condition integration methods during the training, reuse, and specialization phases. 3. **Summarizing Mainstream Conditioning Mechanisms**: Summarizing six mainstream conditioning mechanisms in the sampling process. 4. **Proposing Future Research Directions**: Identifying some key but unresolved issues in current research and suggesting possible solutions. ### Objective The paper aims to provide researchers with a systematic framework to better understand and design conditional image synthesis frameworks based on diffusion models, applicable to various tasks, including unexplored new tasks. Through these methods, the paper hopes to provide researchers with a clear and structured overview, enabling them to more effectively design and implement conditional image synthesis frameworks.

Conditional Image Synthesis with Diffusion Models: A Survey

A Simple Approach to Unifying Diffusion-based Conditional Generation

Conditional Generation from Unconditional Diffusion Models using Denoiser Representations

Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

Diffusion Model-Based Image Editing: A Survey

Med-cDiff: Conditional Medical Image Generation with Diffusion Models

Text-to-image Diffusion Models in Generative AI: A Survey

Adaptively Controllable Diffusion Model for Efficient Conditional Image Generation

Controllable Generation with Text-to-Image Diffusion Models: A Survey

Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model.

Conditional sampling within generative diffusion models

Conditional Image Generation with Pretrained Generative Model

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation

Test-time Conditional Text-to-Image Synthesis Using Diffusion Models

Diffusion Models in Low-Level Vision: A Survey

Semantic Image Synthesis Via Diffusion Models

Diffusion Models: A Comprehensive Survey of Methods and Applications

ShiftDDPMs: Exploring Conditional Diffusion Models by Shifting Diffusion Trajectories

Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation

Semantic Image Synthesis for Abdominal CT