Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models

Ruibin Li,Qihua Zhou,Song Guo,Jie Zhang,Jingcai Guo,Xinyang Jiang,Yifei Shen,Zhenhua Han
2023-06-01
Abstract:Diffusion-based Generative Models (DGMs) have achieved unparalleled performance in synthesizing high-quality visual content, opening up the opportunity to improve image super-resolution (SR) tasks. Recent solutions for these tasks often train architecture-specific DGMs from scratch, or require iterative fine-tuning and distillation on pre-trained DGMs, both of which take considerable time and hardware investments. More seriously, since the DGMs are established with a discrete pre-defined upsampling scale, they cannot well match the emerging requirements of arbitrary-scale super-resolution (ASSR), where a unified model adapts to arbitrary upsampling scales, instead of preparing a series of distinct models for each case. These limitations beg an intriguing question: can we identify the ASSR capability of existing pre-trained DGMs without the need for distillation or fine-tuning? In this paper, we take a step towards resolving this matter by proposing Diff-SR, a first ASSR attempt based solely on pre-trained DGMs, without additional training efforts. It is motivated by an exciting finding that a simple methodology, which first injects a specific amount of noise into the low-resolution images before invoking a DGM's backward diffusion process, outperforms current leading solutions. The key insight is determining a suitable amount of noise to inject, i.e., small amounts lead to poor low-level fidelity, while over-large amounts degrade the high-level signature. Through a finely-grained theoretical analysis, we propose the Perceptual Recoverable Field (PRF), a metric that achieves the optimal trade-off between these two factors. Extensive experiments verify the effectiveness, flexibility, and adaptability of Diff-SR, demonstrating superior performance to state-of-the-art solutions under diverse ASSR environments.
Computer Vision and Pattern Recognition,Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of existing diffusion - generation models (DGMs) when dealing with arbitrary - scale super - resolution (ASSR) tasks. Specifically, traditional DGMs usually need to train models with specific architectures from scratch or perform iterative fine - tuning and distillation on pre - trained DGMs, which is not only time - consuming but also requires a large amount of hardware resources. More importantly, these models are usually built based on discrete predefined up - sampling ratios and cannot well adapt to the emerging ASSR requirements, that is, a unified model can adapt to any up - sampling ratio instead of preparing a series of different models for each situation. To overcome these problems, the paper proposes a method named Diff - SR, which is the first ASSR attempt based solely on pre - trained DGMs without additional training efforts. The core idea of Diff - SR is to inject a specific amount of noise into the low - resolution image and then invoke the reverse diffusion process of the DGM to restore the image. The key lies in determining the appropriate amount of noise injection to find the optimal balance between low - level fidelity and high - level features. For this purpose, the paper introduces the concept of Perceptual Recoverable Field (PRF) and verifies its effectiveness and flexibility through detailed theoretical analysis and experiments.