Abstract:Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising process, have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. This ensures the generation of visually compelling results with intricate texture information. Despite their remarkable success, a noticeable gap exists in a comprehensive survey that amalgamates these pioneering diffusion model-based works and organizes the corresponding threads. This paper proposes the comprehensive review of diffusion model-based techniques. We present three generic diffusion modeling frameworks and explore their correlations with other deep generative models, establishing the theoretical foundation. Following this, we introduce a multi-perspective categorization of diffusion models, considering both the underlying framework and the target task. Additionally, we summarize extended diffusion models applied in other tasks, including medical, remote sensing, and video scenarios. Moreover, we provide an overview of commonly used benchmarks and evaluation metrics. We conduct a thorough evaluation, encompassing both performance and efficiency, of diffusion model-based techniques in three prominent tasks. Finally, we elucidate the limitations of current diffusion models and propose seven intriguing directions for future research. This comprehensive examination aims to facilitate a profound understanding of the landscape surrounding denoising diffusion models in the context of low-level vision tasks. A curated list of diffusion model-based techniques in over 20 low-level vision tasks can be found at <a class="link-external link-https" href="https://github.com/ChunmingHe/awesome-diffusion-models-in-low-level-vision" rel="external noopener nofollow">this https URL</a>.

Do text-free diffusion models learn discriminative visual representations?

Diffusion Models and Representation Learning: A Survey

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery.

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

Diffusion Models in Low-Level Vision: A Survey

NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

Are Diffusion Models Vision-And-Language Reasoners?

Diffusion Models in Vision: A Survey

Large-scale Reinforcement Learning for Diffusion Models

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

UniFL: Improve Latent Diffusion Model via Unified Feedback Learning

Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation

Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts

Diffusion Models Without Attention

Unleashing Text-to-Image Diffusion Models for Visual Perception

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models