Abstract:Diffusion prior-based methods have shown impressive results in real-world image super-resolution (SR). However, most existing methods entangle pixel-level and semantic-level SR objectives in the training process, struggling to balance pixel-wise fidelity and perceptual quality. Meanwhile, users have varying preferences on SR results, thus it is demanded to develop an adjustable SR model that can be tailored to different fidelity-perception preferences during inference without re-training. We present Pixel-level and Semantic-level Adjustable SR (PiSA-SR), which learns two LoRA modules upon the pre-trained stable-diffusion (SD) model to achieve improved and adjustable SR results. We first formulate the SD-based SR problem as learning the residual between the low-quality input and the high-quality output, then show that the learning objective can be decoupled into two distinct LoRA weight spaces: one is characterized by the $\ell_2$-loss for pixel-level regression, and another is characterized by the LPIPS and classifier score distillation losses to extract semantic information from pre-trained classification and SD models. In its default setting, PiSA-SR can be performed in a single diffusion step, achieving leading real-world SR results in both quality and efficiency. By introducing two adjustable guidance scales on the two LoRA modules to control the strengths of pixel-wise fidelity and semantic-level details during inference, PiSASR can offer flexible SR results according to user preference without re-training. Codes and models can be found at <a class="link-external link-https" href="https://github.com/csslc/PiSA-SR" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to balance pixel - level fidelity and perceptual quality in the image super - resolution (SR) task. Most of the existing methods entangle pixel - level and semantic - level super - resolution goals during the training process, making it difficult to optimize these two goals simultaneously. In addition, users have different preferences for super - resolution results, so it is necessary to develop a model that can adjust the fidelity - perception balance according to user preferences at the inference stage without retraining. To address these challenges, the authors propose a model named Pixel - level and Semantic - level Adjustable Super - Resolution (PiSA - SR). This model is based on the pre - trained Stable - Diffusion (SD) model and achieves independent optimization of pixel - level and semantic - level by introducing two Low - Rank Adapter (LoRA) modules. Specifically: 1. **Model Formulation**: - The SD - based super - resolution problem is formulated as learning the residual between low - quality (LQ) input and high - quality (HQ) output. - The training goal of the model can be decomposed into two different LoRA weight spaces: one for pixel - level regression and the other for extracting semantic information. 2. **Dual - LoRA Training**: - The pixel - level and semantic - level goals are decoupled by optimizing the two LoRA modules separately. - The pixel - level LoRA module is trained with ℓ2 loss to improve pixel - level fidelity. - The semantic - level LoRA module is trained with LPIPS and Classifier Score Distillation (CSD) loss to enhance semantic details. 3. **Flexible Inference Process**: - At the inference stage, two adjustable guidance scales, λpix and λsem, are introduced to control the enhancement intensity of pixel - level and semantic - level. - Users can adjust these two parameters according to their own preferences, thus obtaining super - resolution results in different styles. 4. **Experimental Verification**: - The effectiveness of the PiSA - SR model is verified through experiments on synthetic datasets and real - world datasets. - The experimental results show that PiSA - SR not only outperforms existing diffusion - model - based super - resolution methods in performance but also can flexibly adjust super - resolution results to meet different user preferences. In conclusion, this paper proposes a new super - resolution method. By decoupling the optimization goals of pixel - level and semantic - level and providing a flexible adjustment mechanism, the model can improve perceptual quality while maintaining high pixel - level fidelity and adapt to different user preferences.

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Single Remote Sensing Image Super-Resolution Via a Generative Adversarial Network with Stratified Dense Sampling and Chain Training

Denoising Diffusion Probabilistic Model with Adversarial Learning for Remote Sensing Super-Resolution

Detail-Optimized Super-Resolution Reconstruction-Based Multistage Training Strategy for Remote Sensing Semantic Segmentation

XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution

Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution

SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

SRDiff: Single image super-resolution with diffusion probabilistic models

A Conditional Diffusion Model With Fast Sampling Strategy for Remote Sensing Image Super-Resolution

ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer

PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution

Multi-Resolution Learning and Semantic Edge Enhancement for Super-Resolution Semantic Segmentation of Urban Scene Images

Latent Diffusion, Implicit Amplification: Efficient Continuous-Scale Super-Resolution for Remote Sensing Images

Dual Super-Resolution Learning for Semantic Segmentation

Hero-SR: One-Step Diffusion for Super-Resolution with Human Perception Priors

Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors

Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs

ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution

Exploiting Diffusion Prior for Real-World Image Super-Resolution

Efficient Image Restoration through Low-Rank Adaptation and Stable Diffusion XL

Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution