Diffusion Model-Based Image Editing: A Survey

Yi Huang,Jiancheng Huang,Yifan Liu,Mingfu Yan,Jiaxi Lv,Jianzhuang Liu,Wei Xiong,He Zhang,Shifeng Chen,Liangliang Cao
2024-03-16
Abstract:Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse the process of gradually adding noise to images, allowing them to generate high-quality samples from a complex distribution. In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the field. We delve into a thorough analysis and categorization of these works from multiple perspectives, including learning strategies, user-input conditions, and the array of specific editing tasks that can be accomplished. In addition, we pay special attention to image inpainting and outpainting, and explore both earlier traditional context-driven and current multimodal conditional methods, offering a comprehensive analysis of their methodologies. To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval, featuring an innovative metric, LMM Score. Finally, we address current limitations and envision some potential directions for future research. The accompanying repository is released at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issue of providing a comprehensive review in the field of image editing based on diffusion models. Specifically, the paper aims to: 1. **Provide an exhaustive overview of existing methods**: Covering various approaches to image editing techniques based on diffusion models in both theory and practice. 2. **Conduct an in-depth analysis and classification of these methods**: Offering detailed analysis and classification of these methods from multiple perspectives, such as learning strategies, user input conditions, and specific editing tasks. 3. **Pay special attention to image inpainting and extrapolation**: Exploring early traditional context-driven methods and current multimodal conditional methods, and providing a comprehensive analysis of their methodologies. 4. **Propose systematic benchmarking**: To evaluate the performance of text-guided image editing algorithms, a new benchmarking framework called EditEval is proposed, along with the introduction of an innovative evaluation metric, the LMM Score. 5. **Discuss current limitations and future research directions**: Pointing out the shortcomings of current research and envisioning potential future developments. Through these objectives, the paper hopes to provide researchers in the field of image editing based on diffusion models with a systematic resource that not only summarizes current research achievements but also guides future research directions.