Abstract:We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution (SR). Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity caused by the inherent stochasticity of diffusion models, we employ a controllable feature wrapping module that allows users to balance quality and fidelity by simply adjusting a scalar value during the inference process. Moreover, we develop a progressive aggregation sampling strategy to overcome the fixed-size constraints of pre-trained diffusion models, enabling adaptation to resolutions of any size. A comprehensive evaluation of our method using both synthetic and real-world benchmarks demonstrates its superiority over current state-of-the-art approaches. Code and models are available at <a class="link-external link-https" href="https://github.com/IceClear/StableSR" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to generate high - quality high - resolution images using pre - trained diffusion models in the image super - resolution (SR) task while maintaining the authenticity and details of the images. Specifically, the authors propose a new method - StableSR, which achieves blind super - resolution by leveraging the prior knowledge in pre - trained text - to - image diffusion models. The main challenge of this method lies in how to overcome the inherent randomness of diffusion models and adapt to input image resolutions of any size while maintaining the high - fidelity of the generated content. ### Main Contributions: 1. **Time - aware Encoder**: In order to achieve high - quality restoration results without changing the pre - trained synthesis model, the authors design a time - aware encoder, which can adaptively adjust features in different diffusion steps, thus providing stronger guidance in the early iterations to maintain fidelity and weakening the guidance in the later iterations to avoid introducing degradation. 2. **Controllable Feature Wrapping Module (CFW)**: To address the fidelity loss caused by the intrinsic randomness of diffusion models, the authors introduce a controllable feature wrapping module, allowing users to balance quality and fidelity by adjusting a scalar value. 3. **Progressive Aggregation Sampling Strategy**: To overcome the limitation of pre - trained diffusion models on fixed sizes, the authors develop a progressive aggregation sampling strategy, enabling the model to adapt to resolutions of any size. By dividing the image into overlapping small pieces and fusing these pieces in each diffusion iteration to smooth the boundaries, a more coherent output is generated. ### Experimental Results: The authors verify the effectiveness of StableSR through a series of experiments, including quantitative comparisons on synthetic and real - world datasets. The experimental results show that StableSR outperforms existing state - of - the - art methods on multiple metrics, especially performing prominently on evaluation metrics such as FID (Fréchet Inception Distance) and CLIP - IQA (CLIP - based Image Quality Assessment). ### Formula Examples: - **Feature Modulation**: \[ \hat{F}_n^{\text{dif}}=(1 + \alpha_n)\odot F_n^{\text{dif}}+\beta_n; \quad \alpha_n, \beta_n = M_\theta(F_n) \] where $\alpha_n$ and $\beta_n$ are affine parameters in SFT, and $M_\theta$ is a small network containing several convolutional layers. - **Color Correction**: \[ y_c=\hat{y}_c-\frac{\mu_c^{\hat{y}}}{\sigma_c^{\hat{y}}}\cdot\sigma_c^x+\mu_c^x \] where $c\in\{r, g, b\}$ represents the color channel, $\mu_c^{\hat{y}}$ and $\sigma_c^{\hat{y}}$ are the mean and standard deviation estimated from the $c$ - th channel of the generated high - resolution image $\hat{y}$ respectively, and $\mu_c^x$ and $\sigma_c^x$ are the mean and standard deviation estimated from the $c$ - th channel of the low - resolution input $x$ respectively. ### Conclusion: StableSR proposes an innovative method that effectively solves the high - fidelity and arbitrary - scale problems in the image super - resolution task by leveraging the generative prior knowledge in pre - trained diffusion models.

Exploiting Diffusion Prior for Real-World Image Super-Resolution

Self-Reference Image Super-Resolution via Pre-trained Diffusion Large Model and Window Adjustable Transformer

Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs

AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation

ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting

Arbitrary-steps Image Super-resolution via Diffusion Inversion

RAP-SR: RestorAtion Prior Enhancement in Diffusion Models for Realistic Image Super-Resolution

ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer

DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion

Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors

Improving the Stability and Efficiency of Diffusion Models for Content Consistent Super-Resolution

SRDiff: Single image super-resolution with diffusion probabilistic models

BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach

CasSR: Activating Image Power for Real-World Image Super-Resolution

ACDMSR: Accelerated Conditional Diffusion Models for Single Image Super-Resolution

XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution

Hero-SR: One-Step Diffusion for Super-Resolution with Human Perception Priors