Abstract:Video super-resolution (VSR), which takes advantage of multiple low-resolution (LR) video frames to reconstruct corresponding high-resolution (HR) frames in a video, has raised increasing interest. To upsample an LR frame (denoted by a reference frame), VSR methods usually align multiple neighboring frames (denoted by supporting frames) to the reference frame first in order to provide more relevant information. The existing VSR methods usually employ deformable convolution to conduct the frame alignment, where the whole supporting frame is aligned to the reference frame without a specific target and without supervision. Thus, the aligned features are not explicitly learned to provide the HR frame information and cannot fully explore the supporting frames. To address this problem, in this work, we propose a novel video super-resolution framework with Position-Guided Multi-Head Alignment, termed as PGMH-A, to explicitly align the supporting frames to different spatial positions of the HR frame (denoted by different heads). It injects explicit position information to obtain multi-head-aligned features of supporting frames to better formulate the HR frame. PGMH-A can be trained individually or end-to-end with the ground-truth HR frames. Moreover, a Position-Guided Multi-Head Fusion, termed as PGMH-F, is developed based on the attention mechanism to further fuse the spatial–temporal information across temporal supporting frames, across multiple heads corresponding to the different spatial positions of an HR frame, and across multiple channels. Together, the proposed Position-Guided Multi-Head Alignment and Fusion (PGMH-AF) can provide VSR with better local details and temporal coherence. The experimental results demonstrate that the proposed method outperforms the state-of-the-art VSR networks. Ablation studies have also been conducted to verify the effectiveness of the proposed modules.

FFFN: Frame-By-Frame Feedback Fusion Network for Video Super-Resolution

Video super-resolution with phase-aided deformable alignment network

Multi-Stage Feature Fusion Network for Video Super-Resolution

Frame and Feature-Context Video Super-Resolution

How Video Super-Resolution and Frame Interpolation Mutually Benefit

DFVSR: Directional Frequency Video Super-Resolution Via Asymmetric and Enhancement Alignment Network

A Progressive Fusion Generative Adversarial Network for Realistic and Consistent Video Super-Resolution

Fast Spatio-Temporal Residual Network For Video Super-Resolution

Enhanced Bidirectional Propagation Network for Video Super-Resolution

FM-VSR: Feature Multiplexing Video Super-Resolution for Compressed Video

Video Super-Resolution Via a Spatio-Temporal Alignment Network.

Feature Fusion Super Resolution Network with Gradient Guidance

Position-Guided Multi-Head Alignment and Fusion for Video Super-Resolution

Collaborative Feedback Discriminative Propagation for Video Super-Resolution

Feature Aggregating Network with Inter-Frame Interaction for Efficient Video Super-Resolution

Dual feature enhanced video super-resolution network based on low-light scenarios

Progressive Fusion Video Super-Resolution Network Via Exploiting Non-Local Spatio-Temporal Correlations.

You Only Align Once: Bidirectional Interaction for Spatial-Temporal Video Super-Resolution

Deep Video Super-Resolution with Flow-Guided Deformable Alignment and Sparsity-based Temporal-Spatial Enhancement

Iterative Back Projection Network Based on Deformable 3D Convolution

Video super-resolution via mixed spatial-temporal convolution and selective fusion