RFSR: Improving ISR Diffusion Models via Reward Feedback Learning

Xiaopeng Sun,Qinwei Lin,Yu Gao,Yujie Zhong,Chengjian Feng,Dengjie Li,Zheng Zhao,Jie Hu,Lin Ma
2024-12-04
Abstract:Generative diffusion models (DM) have been extensively utilized in image super-resolution (ISR). Most of the existing methods adopt the denoising loss from DDPMs for model optimization. We posit that introducing reward feedback learning to finetune the existing models can further improve the quality of the generated images. In this paper, we propose a timestep-aware training strategy with reward feedback learning. Specifically, in the initial denoising stages of ISR diffusion, we apply low-frequency constraints to super-resolution (SR) images to maintain structural stability. In the later denoising stages, we use reward feedback learning to improve the perceptual and aesthetic quality of the SR images. In addition, we incorporate Gram-KL regularization to alleviate stylization caused by reward hacking. Our method can be integrated into any diffusion-based ISR model in a plug-and-play manner. Experiments show that ISR diffusion models, when fine-tuned with our method, significantly improve the perceptual and aesthetic quality of SR images, achieving excellent subjective results. Code: <a class="link-external link-https" href="https://github.com/sxpro/RFSR" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to improve the image super - resolution (ISR) technology based on the diffusion model by introducing Reward Feedback Learning (RFL). Specifically, the author proposes a timestep - aware training strategy to further enhance the quality of images generated by the ISR diffusion model. #### Main problems and solutions 1. **Limitations of existing methods**: - Most current ISR diffusion models mainly rely on denoising loss, which limits the perceptual quality and aesthetic effect of the generated images to a certain extent. - Existing ISR methods are prone to distortion of high - frequency information in the later denoising stage, resulting in a deviation between the details of the generated image and the real image. 2. **Introduction of Reward Feedback Learning**: - To improve this situation, the author proposes to introduce Reward Feedback Learning in the ISR diffusion model and optimize the model performance by combining subjective and objective reward models. - In the early denoising stage, use low - frequency constraints to maintain the stability of the image structure; in the later denoising stage, use Reward Feedback Learning to improve the perceptual quality and aesthetic effect of the generated images. 3. **Solving the reward - hacking problem**: - Direct application of Reward Feedback Learning may lead to the "reward - hacking" phenomenon, that is, although the perceptual metric is high, the actual visual quality decreases. - For this reason, the author introduces Gram - KL regularization to alleviate the image stylization problem caused by reward - hacking. #### Method overview 1. **Low - Frequency Structure Constraint**: - Use the Discrete Wavelet Transform (DWT) to extract the low - frequency components of the image and define the low - frequency information constraint \( L_{dwtll} \) to maintain the stability of the image structure. \[ L_{dwtll} = |DWT(I_{gt})_{LL} - DWT(I_t)_{LL}| \] 2. **Reward Feedback Learning**: - Define the reward loss function \( L_{reward} \) and combine two reward models, CLIP - IQA and Image Reward (IW), to improve the perceptual quality and aesthetic effect of the generated images. \[ L_{reward} = \lambda_{clipiqa} L_{CLIP - IQA}(I_t) + \lambda_{iw} L_{IW}(c_t, I_t) \] 3. **Alleviating Reward Hacking**: - Introduce Gram - KL regularization. Calculate the Gram matrix difference between the generated image and the image generated by the pre - trained model to suppress stylization. \[ L_{gram - kl} = \| \text{Gram}(VGG(G_\theta(z_t, I_{lr}, t, c_v, c_t))) - \text{Gram}(VGG(G'_\theta(z_t, I_{lr}, t, c_v, c_t))) \|_2^2 \] 4. **Timestep - aware Training**: - According to different timesteps, select different loss functions for optimization. Use low - frequency information constraints in the early denoising stage, and use Reward Feedback Learning and Gram - KL regularization in the later denoising stage. \[ \text{Loss} =