Improving the Stability and Efficiency of Diffusion Models for Content Consistent Super-Resolution

Lingchen Sun,Rongyuan Wu,Jie Liang,Zhengqiang Zhang,Hongwei Yong,Lei Zhang

2024-09-25

Abstract:The generative priors of pre-trained latent diffusion models (DMs) have demonstrated great potential to enhance the visual quality of image super-resolution (SR) results. However, the noise sampling process in DMs introduces randomness in the SR outputs, and the generated contents can differ a lot with different noise samples. The multi-step diffusion process can be accelerated by distilling methods, but the generative capacity is difficult to control. To address these issues, we analyze the respective advantages of DMs and generative adversarial networks (GANs) and propose to partition the generative SR process into two stages, where the DM is employed for reconstructing image structures and the GAN is employed for improving fine-grained details. Specifically, we propose a non-uniform timestep sampling strategy in the first stage. A single timestep sampling is first applied to extract the coarse information from the input image, then a few reverse steps are used to reconstruct the main structures. In the second stage, we finetune the decoder of the pre-trained variational auto-encoder by adversarial GAN training for deterministic detail enhancement. Once trained, our proposed method, namely content consistent super-resolution (CCSR),allows flexible use of different diffusion steps in the inference stage without re-training. Extensive experiments show that with 2 or even 1 diffusion step, CCSR can significantly improve the content consistency of SR outputs while keeping high perceptual quality. Codes and models can be found at \href{<a class="link-external link-https" href="https://github.com/csslc/CCSR" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/csslc/CCSR" rel="external noopener nofollow">this https URL</a>}.

Image and Video Processing,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the issues of stability and efficiency in image super-resolution (SR) technology based on Diffusion Models (DM). Specifically: 1. **Instability caused by randomness**: - Diffusion models introduce noise sampling steps during the generation process, leading to variations in the results each time, especially in terms of texture and details. - This randomness affects the content consistency and fidelity of the generated results. 2. **Efficiency issues in multi-step diffusion processes**: - Although multi-step diffusion can improve generation quality, the process is time-consuming and difficult to control in terms of generation capability. - Single-step diffusion can accelerate the generation process but results in a decline in the quality of the generated output. To address these issues, the authors propose a method called Content Consistent Super-Resolution (CCSR), which divides the super-resolution process into two stages: - The first stage uses a diffusion model to generate the basic structure of the image; - The second stage enhances details through adversarial training (GAN). This method not only improves the consistency and visual quality of the generated results but also can be flexibly applied in single-step or multi-step diffusion to meet the needs of different scenarios.

Improving the Stability and Efficiency of Diffusion Models for Content Consistent Super-Resolution

CDPMSR: Conditional Diffusion Probabilistic Models for Single Image Super-Resolution

ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution

Single Remote Sensing Image Super-Resolution Via a Generative Adversarial Network with Stratified Dense Sampling and Chain Training

A Conditional Diffusion Model With Fast Sampling Strategy for Remote Sensing Image Super-Resolution

ACDMSR: Accelerated Conditional Diffusion Models for Single Image Super-Resolution

Single image super-resolution with denoising diffusion GANS

AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution

AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation

Exploiting Diffusion Prior for Real-World Image Super-Resolution

Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs

Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models

SRDiff: Single image super-resolution with diffusion probabilistic models

Diffusion Model with Detail Complement for Super-Resolution of Remote Sensing

Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach

DSR-Diff: Depth Map Super-Resolution with Diffusion Model

DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion

EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution

TASR: Timestep-Aware Diffusion Model for Image Super-Resolution

Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution