Abstract:Generative diffusion models (DM) have been extensively utilized in image super-resolution (ISR). Most of the existing methods adopt the denoising loss from DDPMs for model optimization. We posit that introducing reward feedback learning to finetune the existing models can further improve the quality of the generated images. In this paper, we propose a timestep-aware training strategy with reward feedback learning. Specifically, in the initial denoising stages of ISR diffusion, we apply low-frequency constraints to super-resolution (SR) images to maintain structural stability. In the later denoising stages, we use reward feedback learning to improve the perceptual and aesthetic quality of the SR images. In addition, we incorporate Gram-KL regularization to alleviate stylization caused by reward hacking. Our method can be integrated into any diffusion-based ISR model in a plug-and-play manner. Experiments show that ISR diffusion models, when fine-tuned with our method, significantly improve the perceptual and aesthetic quality of SR images, achieving excellent subjective results. Code: <a class="link-external link-https" href="https://github.com/sxpro/RFSR" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to improve the image super - resolution (ISR) technology based on the diffusion model by introducing Reward Feedback Learning (RFL). Specifically, the author proposes a timestep - aware training strategy to further enhance the quality of images generated by the ISR diffusion model. #### Main problems and solutions 1. **Limitations of existing methods**: - Most current ISR diffusion models mainly rely on denoising loss, which limits the perceptual quality and aesthetic effect of the generated images to a certain extent. - Existing ISR methods are prone to distortion of high - frequency information in the later denoising stage, resulting in a deviation between the details of the generated image and the real image. 2. **Introduction of Reward Feedback Learning**: - To improve this situation, the author proposes to introduce Reward Feedback Learning in the ISR diffusion model and optimize the model performance by combining subjective and objective reward models. - In the early denoising stage, use low - frequency constraints to maintain the stability of the image structure; in the later denoising stage, use Reward Feedback Learning to improve the perceptual quality and aesthetic effect of the generated images. 3. **Solving the reward - hacking problem**: - Direct application of Reward Feedback Learning may lead to the "reward - hacking" phenomenon, that is, although the perceptual metric is high, the actual visual quality decreases. - For this reason, the author introduces Gram - KL regularization to alleviate the image stylization problem caused by reward - hacking. #### Method overview 1. **Low - Frequency Structure Constraint**: - Use the Discrete Wavelet Transform (DWT) to extract the low - frequency components of the image and define the low - frequency information constraint \( L_{dwtll} \) to maintain the stability of the image structure. \[ L_{dwtll} = |DWT(I_{gt})_{LL} - DWT(I_t)_{LL}| \] 2. **Reward Feedback Learning**: - Define the reward loss function \( L_{reward} \) and combine two reward models, CLIP - IQA and Image Reward (IW), to improve the perceptual quality and aesthetic effect of the generated images. \[ L_{reward} = \lambda_{clipiqa} L_{CLIP - IQA}(I_t) + \lambda_{iw} L_{IW}(c_t, I_t) \] 3. **Alleviating Reward Hacking**: - Introduce Gram - KL regularization. Calculate the Gram matrix difference between the generated image and the image generated by the pre - trained model to suppress stylization. \[ L_{gram - kl} = \| \text{Gram}(VGG(G_\theta(z_t, I_{lr}, t, c_v, c_t))) - \text{Gram}(VGG(G'_\theta(z_t, I_{lr}, t, c_v, c_t))) \|_2^2 \] 4. **Timestep - aware Training**: - According to different timesteps, select different loss functions for optimization. Use low - frequency information constraints in the early denoising stage, and use Reward Feedback Learning and Gram - KL regularization in the later denoising stage. \[ \text{Loss} =

RFSR: Improving ISR Diffusion Models via Reward Feedback Learning

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

A Conditional Diffusion Model With Fast Sampling Strategy for Remote Sensing Image Super-Resolution

Denoising Diffusion Probabilistic Model with Adversarial Learning for Remote Sensing Super-Resolution

Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution

SRDiff: Single image super-resolution with diffusion probabilistic models

TASR: Timestep-Aware Diffusion Model for Image Super-Resolution

DDFSRM: Denoising Diffusion Fusion Model for Line-Scanning Super-Resolution

AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

Improving the Stability and Efficiency of Diffusion Models for Content Consistent Super-Resolution

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

ACDMSR: Accelerated Conditional Diffusion Models for Single Image Super-Resolution

Diffusion Model with Detail Complement for Super-Resolution of Remote Sensing

ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution

One-Step Effective Diffusion Network for Real-World Image Super-Resolution

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion

BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution

Latent Diffusion, Implicit Amplification: Efficient Continuous-Scale Super-Resolution for Remote Sensing Images