A Practical Contrastive Learning Framework for Single-Image Super-Resolution

Gang Wu,Junjun Jiang,Xianming Liu
DOI: https://doi.org/10.1109/TNNLS.2023.3290038
2023-07-17
Abstract:Contrastive learning has achieved remarkable success on various high-level tasks, but there are fewer contrastive learning-based methods proposed for low-level tasks. It is challenging to adopt vanilla contrastive learning technologies proposed for high-level visual tasks to low-level image restoration problems straightly. Because the acquired high-level global visual representations are insufficient for low-level tasks requiring rich texture and context information. In this paper, we investigate the contrastive learning-based single image super-resolution from two perspectives: positive and negative sample construction and feature embedding. The existing methods take naive sample construction approaches (e.g., considering the low-quality input as a negative sample and the ground truth as a positive sample) and adopt a prior model (e.g., pre-trained VGG model) to obtain the feature embedding. To this end, we propose a practical contrastive learning framework for SISR, named PCL-SR. We involve the generation of many informative positive and hard negative samples in frequency space. Instead of utilizing an additional pre-trained network, we design a simple but effective embedding network inherited from the discriminator network which is more task-friendly. Compared with existing benchmark methods, we re-train them by our proposed PCL-SR framework and achieve superior performance. Extensive experiments have been conducted to show the effectiveness and technical contributions of our proposed PCL-SR thorough ablation studies. The code and pre-trained models can be found at <a class="link-external link-https" href="https://github.com/Aitical/PCL-SISR" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily focuses on addressing issues in the Single Image Super-Resolution (SISR) task, particularly on how to leverage contrastive learning to improve existing SISR methods. Specifically, the paper aims to solve the following key problems: 1. **How to effectively generate positive and negative samples**: Traditional contrastive learning methods face challenges when dealing with low-level vision tasks because directly applying contrastive learning techniques used in high-level vision tasks often fails to capture the rich textures and contextual information required. To address this, the authors propose a new sample generation strategy, including generating multiple information-rich positive samples (by sharpening high-resolution images) and hard-to-distinguish negative samples (by slightly blurring high-resolution images), to encourage the network to learn more details. 2. **How to design a task-appropriate feature embedding network**: Existing methods typically rely on pre-trained VGG networks as feature embedding networks, which may not be ideal because VGG networks tend to extract high-level semantic information rather than task-specific information. Therefore, the paper proposes using the discriminator network in the super-resolution network as the feature embedding network, which is more suitable for the SISR task and can better capture detail changes. 3. **How to apply contrastive learning to the SISR task**: Through the aforementioned sample generation strategy and feature embedding network design, the paper constructs a practical contrastive learning framework (PCL-SR) to improve the quality of SISR results. This framework not only generates multiple positive samples and hard-to-distinguish negative samples but also uses multi-layer intermediate features to calculate contrastive loss, enabling the model to learn useful information at different levels. In summary, the core objective of the paper is to improve the quality of results in the Single Image Super-Resolution task by introducing a new contrastive learning framework, particularly in generating finer and more realistic images.