Abstract:Reference-based Super-Resolution (Ref-SR) has recently emerged as a promising paradigm to enhance a low-resolution (LR) input image or video by introducing an additional high-resolution (HR) reference image. Existing Ref-SR methods mostly rely on implicit correspondence matching to borrow HR textures from reference images to compensate for the information loss in input images. However, performing local transfer is difficult because of two gaps between input and reference images: the transformation gap (e.g., scale and rotation) and the resolution gap (e.g., HR and LR). To tackle these challenges, we propose C2-Matching in this work, which performs explicit robust matching crossing transformation and resolution. 1) To bridge the transformation gap, we propose a contrastive correspondence network, which learns transformation-robust correspondences using augmented views of the input image. 2) To address the resolution gap, we adopt teacher-student correlation distillation, which distills knowledge from the easier HR-HR matching to guide the more ambiguous LR-HR matching. 3) Finally, we design a dynamic aggregation module to address the potential misalignment issue between input images and reference images. In addition, to faithfully evaluate the performance of Reference-based Image Super-Resolution (Ref Image SR) under a realistic setting, we contribute the Webly-Referenced SR (WR-SR) dataset, mimicking the practical usage scenario. We also extend C2-Matching to Reference-based Video Super-Resolution (Ref VSR) task, where an image taken in a similar scene serves as the HR reference image. Extensive experiments demonstrate that our proposed C2-Matching significantly outperforms state of the arts by up to 0.7 dB on the standard CUFED5 benchmark and also boosts the performance of video super-resolution by incorporating the C2-Matching component into Video SR pipelines. Notably, C2-Matching also shows great generalizability on WR-SR dataset as well as robustness across large scale and rotation transformations. Codes and datasets are available at https://github.com/yumingj/C2-Matching.

Self-Reference Image Super-Resolution via Pre-trained Diffusion Large Model and Window Adjustable Transformer

SR-USRN: Learning Image Super-Resolution with Unified Structure and Reverse Network.

Reference-Based Image Super-Resolution with Deformable Attention Transformer.

CSwT-SR: Conv-Swin Transformer for Blind Remote Sensing Image Super-Resolution with Amplitude-Phase Learning and Structural Detail Alternating Learning

Rethinking Multi-Contrast MRI Super-Resolution: Rectangle-Window Cross-Attention Transformer and Arbitrary-Scale Upsampling

Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Effective Diffusion Transformer Architecture for Image Super-Resolution

Detail-Enhancing Framework for Reference-Based Image Super-Resolution

AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation

Exploiting Diffusion Prior for Real-World Image Super-Resolution

Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

DSMA: Reference-Based Image Super-Resolution Method Based on Dual-View Supervised Learning and Multi-Attention Mechanism

RRSR: Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection.

Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

Enhanced Window-Based Self-Attention with Global and Multi-Scale Representations for Remote Sensing Image Super-Resolution

Task Decoupled Framework for Reference-based Super-Resolution

Learning Discrete Representations From Reference Images for Large Scale Factor Image Super-Resolution

Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model

SRFormerV2: Taking a Closer Look at Permuted Self-Attention for Image Super-Resolution

Dense Contrastive Learning and Depth Dynamic Aggregation for Reference-based Super-Resolution.

Reference-Based Image and Video Super-Resolution via C2-Matching