Abstract:Video super-resolution (VSR), which takes advantage of multiple low-resolution (LR) video frames to reconstruct corresponding high-resolution (HR) frames in a video, has raised increasing interest. To upsample an LR frame (denoted by a reference frame), VSR methods usually align multiple neighboring frames (denoted by supporting frames) to the reference frame first in order to provide more relevant information. The existing VSR methods usually employ deformable convolution to conduct the frame alignment, where the whole supporting frame is aligned to the reference frame without a specific target and without supervision. Thus, the aligned features are not explicitly learned to provide the HR frame information and cannot fully explore the supporting frames. To address this problem, in this work, we propose a novel video super-resolution framework with Position-Guided Multi-Head Alignment, termed as PGMH-A, to explicitly align the supporting frames to different spatial positions of the HR frame (denoted by different heads). It injects explicit position information to obtain multi-head-aligned features of supporting frames to better formulate the HR frame. PGMH-A can be trained individually or end-to-end with the ground-truth HR frames. Moreover, a Position-Guided Multi-Head Fusion, termed as PGMH-F, is developed based on the attention mechanism to further fuse the spatial–temporal information across temporal supporting frames, across multiple heads corresponding to the different spatial positions of an HR frame, and across multiple channels. Together, the proposed Position-Guided Multi-Head Alignment and Fusion (PGMH-AF) can provide VSR with better local details and temporal coherence. The experimental results demonstrate that the proposed method outperforms the state-of-the-art VSR networks. Ablation studies have also been conducted to verify the effectiveness of the proposed modules.

Multi-Frequency Representation Enhancement with Privilege Information for Video Super-Resolution

Multi-Scale Cross-Attention Fusion Network Based on Image Super-Resolution

Multi-Stage Feature Fusion Network for Video Super-Resolution

FM-VSR: Feature Multiplexing Video Super-Resolution for Compressed Video

Boosting Image Super-Resolution Via Fusion of Complementary Information Captured by Multi-Modal Sensors

Multi-Source Deep Residual Fusion Network for Depth Image Super-resolution

Multi-Memory Convolutional Neural Network for Video Super-Resolution.

Video super-resolution via mixed spatial-temporal convolution and selective fusion

Single-image Super-Resolution Via Selective Multi-Scale Network

Enhanced Bidirectional Propagation Network for Video Super-Resolution

Position-Guided Multi-Head Alignment and Fusion for Video Super-Resolution

Learning Degradation-Robust Spatiotemporal Frequency-Transformer for Video Super-Resolution

FFFN: Frame-By-Frame Feedback Fusion Network for Video Super-Resolution

Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

Multi-Reference-Based Cross-Scale Feature Fusion for Compressed Video Super Resolution

How Video Super-Resolution and Frame Interpolation Mutually Benefit

Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-Resolution

Multi-Branch Networks for Video Super-Resolution with Dynamic Reconstruction Strategy.

Image Super-Resolution Reconstruction Model Based on Multi-Feature Fusion

A Single Image High-Perception Super-Resolution Reconstruction Method Based on Multi-layer Feature Fusion Model with Adaptive Compression and Parameter Tuning

Dual feature enhanced video super-resolution network based on low-light scenarios