Abstract:In this paper, we introduce YONOS-SR, a novel stable diffusion-based approach for image super-resolution that yields state-of-the-art results using only a single DDIM step. We propose a novel scale distillation approach to train our SR model. Instead of directly training our SR model on the scale factor of interest, we start by training a teacher model on a smaller magnification scale, thereby making the SR problem simpler for the teacher. We then train a student model for a higher magnification scale, using the predictions of the teacher as a target during the training. This process is repeated iteratively until we reach the target scale factor of the final model. The rationale behind our scale distillation is that the teacher aids the student diffusion model training by i) providing a target adapted to the current noise level rather than using the same target coming from ground truth data for all noise levels and ii) providing an accurate target as the teacher has a simpler task to solve. We empirically show that the distilled model significantly outperforms the model trained for high scales directly, specifically with few steps during inference. Having a strong diffusion model that requires only one step allows us to freeze the U-Net and fine-tune the decoder on top of it. We show that the combination of spatially distilled U-Net and fine-tuned decoder outperforms state-of-the-art methods requiring 200 steps with only one single step.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: how to significantly reduce the number of inference steps required by diffusion models while maintaining high - quality image super - resolution (SR) generation. Specifically, the authors propose a new method named YONOS - SR, aiming to enable the diffusion model to achieve super - resolution effects comparable to or even better than traditional multi - step models with only one DDIM step by introducing the scale distillation technique. ### Specific description of the problem 1. **Computational efficiency problem**: - Diffusion models usually require a large number of sequential denoising steps for image generation, which leads to extremely high computational costs, especially in high - resolution settings. - For 4 - fold or 8 - fold magnification tasks, traditional methods may need to divide the input image into multiple small pieces and process them separately, which makes the inference process very time - consuming. 2. **Performance degradation with a low number of inference steps**: - When the number of inference steps is reduced, the performance of existing methods will decline sharply. Especially in the case of only one step, the image quality deteriorates significantly. 3. **Increasing complexity with magnification**: - As the magnification factor increases, the difficulty of the super - resolution task also increases accordingly. For example, 4 - fold magnification is more difficult than 2 - fold magnification because the input low - resolution image is more degraded. ### Overview of the solution To solve the above problems, the paper proposes the following innovations: 1. **Scale Distillation**: - Simplify the super - resolution task by training the teacher model and the student model step by step. First, train the teacher model at a lower magnification factor, and then use the prediction results of the teacher model as a supervision signal to train the student model with a higher magnification factor. - This method enables the student model to learn more accurate targets from the teacher model, thereby achieving high - quality super - resolution with fewer inference steps. 2. **Single - step inference optimization**: - Combine scale distillation and decoder fine - tuning so that the model can generate high - quality high - resolution images with only one inference step. 3. **Experimental verification**: - Extensive experiments have been carried out on multiple datasets such as DIV2K and ImageNet, proving the superior performance of the proposed method at different magnification factors, especially showing significant advantages at 8 - fold magnification. ### Summary The core objective of this paper is to significantly reduce the number of inference steps of diffusion models in image super - resolution tasks by introducing the scale distillation technique while maintaining or even improving the quality of the generated images. This innovation not only improves computational efficiency but also provides greater feasibility for practical applications.

You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation

Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution

SinSR: Diffusion-Based Image Super-Resolution in a Single Step

One Step Diffusion-based Super-Resolution with Time-Aware Distillation

ACDMSR: Accelerated Conditional Diffusion Models for Single Image Super-Resolution

AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution

Dynamic Attention-Guided Diffusion for Image Super-Resolution

Arbitrary-steps Image Super-resolution via Diffusion Inversion

Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning

Single Remote Sensing Image Super-Resolution Via a Generative Adversarial Network with Stratified Dense Sampling and Chain Training

NoUCSR: Efficient Super-Resolution Network Without Upsampling Convolution.

DDistill-SR: Reparameterized Dynamic Distillation Network for Lightweight Image Super-Resolution

Towards Compact Single Image Super-Resolution Via Contrastive Self-distillation

A Conditional Diffusion Model With Fast Sampling Strategy for Remote Sensing Image Super-Resolution

Latent Diffusion, Implicit Amplification: Efficient Continuous-Scale Super-Resolution for Remote Sensing Images

Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors

SRDiff: Single image super-resolution with diffusion probabilistic models

Zoomed In, Diffused Out: Towards Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution