Improving the Stability and Efficiency of Diffusion Models for Content Consistent Super-Resolution

Lingchen Sun,Rongyuan Wu,Jie Liang,Zhengqiang Zhang,Hongwei Yong,Lei Zhang
2024-09-25
Abstract:The generative priors of pre-trained latent diffusion models (DMs) have demonstrated great potential to enhance the visual quality of image super-resolution (SR) results. However, the noise sampling process in DMs introduces randomness in the SR outputs, and the generated contents can differ a lot with different noise samples. The multi-step diffusion process can be accelerated by distilling methods, but the generative capacity is difficult to control. To address these issues, we analyze the respective advantages of DMs and generative adversarial networks (GANs) and propose to partition the generative SR process into two stages, where the DM is employed for reconstructing image structures and the GAN is employed for improving fine-grained details. Specifically, we propose a non-uniform timestep sampling strategy in the first stage. A single timestep sampling is first applied to extract the coarse information from the input image, then a few reverse steps are used to reconstruct the main structures. In the second stage, we finetune the decoder of the pre-trained variational auto-encoder by adversarial GAN training for deterministic detail enhancement. Once trained, our proposed method, namely content consistent super-resolution (CCSR),allows flexible use of different diffusion steps in the inference stage without re-training. Extensive experiments show that with 2 or even 1 diffusion step, CCSR can significantly improve the content consistency of SR outputs while keeping high perceptual quality. Codes and models can be found at \href{<a class="link-external link-https" href="https://github.com/csslc/CCSR" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/csslc/CCSR" rel="external noopener nofollow">this https URL</a>}.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the issues of stability and efficiency in image super-resolution (SR) technology based on Diffusion Models (DM). Specifically: 1. **Instability caused by randomness**: - Diffusion models introduce noise sampling steps during the generation process, leading to variations in the results each time, especially in terms of texture and details. - This randomness affects the content consistency and fidelity of the generated results. 2. **Efficiency issues in multi-step diffusion processes**: - Although multi-step diffusion can improve generation quality, the process is time-consuming and difficult to control in terms of generation capability. - Single-step diffusion can accelerate the generation process but results in a decline in the quality of the generated output. To address these issues, the authors propose a method called Content Consistent Super-Resolution (CCSR), which divides the super-resolution process into two stages: - The first stage uses a diffusion model to generate the basic structure of the image; - The second stage enhances details through adversarial training (GAN). This method not only improves the consistency and visual quality of the generated results but also can be flexibly applied in single-step or multi-step diffusion to meet the needs of different scenarios.