Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Lingchen Sun,Rongyuan Wu,Zhiyuan Ma,Shuaizheng Liu,Qiaosi Yi,Lei Zhang
2024-12-04
Abstract:Diffusion prior-based methods have shown impressive results in real-world image super-resolution (SR). However, most existing methods entangle pixel-level and semantic-level SR objectives in the training process, struggling to balance pixel-wise fidelity and perceptual quality. Meanwhile, users have varying preferences on SR results, thus it is demanded to develop an adjustable SR model that can be tailored to different fidelity-perception preferences during inference without re-training. We present Pixel-level and Semantic-level Adjustable SR (PiSA-SR), which learns two LoRA modules upon the pre-trained stable-diffusion (SD) model to achieve improved and adjustable SR results. We first formulate the SD-based SR problem as learning the residual between the low-quality input and the high-quality output, then show that the learning objective can be decoupled into two distinct LoRA weight spaces: one is characterized by the $\ell_2$-loss for pixel-level regression, and another is characterized by the LPIPS and classifier score distillation losses to extract semantic information from pre-trained classification and SD models. In its default setting, PiSA-SR can be performed in a single diffusion step, achieving leading real-world SR results in both quality and efficiency. By introducing two adjustable guidance scales on the two LoRA modules to control the strengths of pixel-wise fidelity and semantic-level details during inference, PiSASR can offer flexible SR results according to user preference without re-training. Codes and models can be found at <a class="link-external link-https" href="https://github.com/csslc/PiSA-SR" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to balance pixel - level fidelity and perceptual quality in the image super - resolution (SR) task. Most of the existing methods entangle pixel - level and semantic - level super - resolution goals during the training process, making it difficult to optimize these two goals simultaneously. In addition, users have different preferences for super - resolution results, so it is necessary to develop a model that can adjust the fidelity - perception balance according to user preferences at the inference stage without retraining. To address these challenges, the authors propose a model named Pixel - level and Semantic - level Adjustable Super - Resolution (PiSA - SR). This model is based on the pre - trained Stable - Diffusion (SD) model and achieves independent optimization of pixel - level and semantic - level by introducing two Low - Rank Adapter (LoRA) modules. Specifically: 1. **Model Formulation**: - The SD - based super - resolution problem is formulated as learning the residual between low - quality (LQ) input and high - quality (HQ) output. - The training goal of the model can be decomposed into two different LoRA weight spaces: one for pixel - level regression and the other for extracting semantic information. 2. **Dual - LoRA Training**: - The pixel - level and semantic - level goals are decoupled by optimizing the two LoRA modules separately. - The pixel - level LoRA module is trained with ℓ2 loss to improve pixel - level fidelity. - The semantic - level LoRA module is trained with LPIPS and Classifier Score Distillation (CSD) loss to enhance semantic details. 3. **Flexible Inference Process**: - At the inference stage, two adjustable guidance scales, λpix and λsem, are introduced to control the enhancement intensity of pixel - level and semantic - level. - Users can adjust these two parameters according to their own preferences, thus obtaining super - resolution results in different styles. 4. **Experimental Verification**: - The effectiveness of the PiSA - SR model is verified through experiments on synthetic datasets and real - world datasets. - The experimental results show that PiSA - SR not only outperforms existing diffusion - model - based super - resolution methods in performance but also can flexibly adjust super - resolution results to meet different user preferences. In conclusion, this paper proposes a new super - resolution method. By decoupling the optimization goals of pixel - level and semantic - level and providing a flexible adjustment mechanism, the model can improve perceptual quality while maintaining high pixel - level fidelity and adapt to different user preferences.