Conditional Image Synthesis with Diffusion Models: A Survey

Zheyuan Zhan,Defang Chen,Jian-Ping Mei,Zhenghe Zhao,Jiawei Chen,Chun Chen,Siwei Lyu,Can Wang
2024-10-03
Abstract:Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis, leading to exponential growth in the literature. However, the complexity of diffusion-based modeling, the wide range of image synthesis tasks, and the diversity of conditioning mechanisms present significant challenges for researchers to keep up with rapid developments and understand the core concepts on this topic. In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. We specifically highlight the underlying principles, advantages, and potential challenges of various conditioning approaches in the training, re-purposing, and specialization stages to construct a desired denoising network. We also summarize six mainstream conditioning mechanisms in the essential sampling process. All discussions are centered around popular applications. Finally, we pinpoint some critical yet still open problems to be solved in the future and suggest some possible solutions. Our reviewed works are itemized at <a class="link-external link-https" href="https://github.com/zju-pi/Awesome-Conditional-Diffusion-Models" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of how to effectively integrate user-specified conditions in the field of Conditional Image Synthesis to generate high-quality images that meet diverse needs. Specifically, the paper focuses on conditional image synthesis methods based on Diffusion Models (DMs). ### Main Issues 1. **Complexity of Condition Integration**: Existing diffusion models exhibit complexity in condition integration. Different conditioning mechanisms and task types make it difficult for researchers to keep up with the rapid development and understand core concepts. 2. **Diversity of Model Architectures and Training Methods**: There are various variants of diffusion model architectures, training methods, and sampling techniques, making it challenging for researchers to fully grasp the overall landscape of this field. 3. **Breadth of Conditional Synthesis Tasks**: There are numerous types of conditional image synthesis tasks, including text-to-image generation, image restoration, image editing, etc., each with its specific condition integration requirements. ### Solutions The paper addresses these issues through the following approaches: 1. **Classifying Existing Work**: Classifying existing work based on how conditions are integrated into the two fundamental components of diffusion models (denoising network and sampling process). 2. **Detailed Discussion of Condition Integration Mechanisms**: Highlighting the advantages, challenges, and potential issues of various condition integration methods during the training, reuse, and specialization phases. 3. **Summarizing Mainstream Conditioning Mechanisms**: Summarizing six mainstream conditioning mechanisms in the sampling process. 4. **Proposing Future Research Directions**: Identifying some key but unresolved issues in current research and suggesting possible solutions. ### Objective The paper aims to provide researchers with a systematic framework to better understand and design conditional image synthesis frameworks based on diffusion models, applicable to various tasks, including unexplored new tasks. Through these methods, the paper hopes to provide researchers with a clear and structured overview, enabling them to more effectively design and implement conditional image synthesis frameworks.