Abstract:Video super-resolution (VSR), which takes advantage of multiple low-resolution (LR) video frames to reconstruct corresponding high-resolution (HR) frames in a video, has raised increasing interest. To upsample an LR frame (denoted by a reference frame), VSR methods usually align multiple neighboring frames (denoted by supporting frames) to the reference frame first in order to provide more relevant information. The existing VSR methods usually employ deformable convolution to conduct the frame alignment, where the whole supporting frame is aligned to the reference frame without a specific target and without supervision. Thus, the aligned features are not explicitly learned to provide the HR frame information and cannot fully explore the supporting frames. To address this problem, in this work, we propose a novel video super-resolution framework with Position-Guided Multi-Head Alignment, termed as PGMH-A, to explicitly align the supporting frames to different spatial positions of the HR frame (denoted by different heads). It injects explicit position information to obtain multi-head-aligned features of supporting frames to better formulate the HR frame. PGMH-A can be trained individually or end-to-end with the ground-truth HR frames. Moreover, a Position-Guided Multi-Head Fusion, termed as PGMH-F, is developed based on the attention mechanism to further fuse the spatial–temporal information across temporal supporting frames, across multiple heads corresponding to the different spatial positions of an HR frame, and across multiple channels. Together, the proposed Position-Guided Multi-Head Alignment and Fusion (PGMH-AF) can provide VSR with better local details and temporal coherence. The experimental results demonstrate that the proposed method outperforms the state-of-the-art VSR networks. Ablation studies have also been conducted to verify the effectiveness of the proposed modules.

Multi-scale Non-local Bidirectional Fusion for Video Super-Resolution

Video Super-Resolution Based on Multiple Networks Merging

Multi-Stage Feature Fusion Network for Video Super-Resolution

Video super-resolution with phase-aided deformable alignment network

Enhanced Bidirectional Propagation Network for Video Super-Resolution

FFFN: Frame-By-Frame Feedback Fusion Network for Video Super-Resolution

You Only Align Once: Bidirectional Interaction for Spatial-Temporal Video Super-Resolution

Video super-resolution via dense non-local spatial-temporal convolutional network

Video Super-Resolution Via a Spatio-Temporal Alignment Network.

Multi-Reference-Based Cross-Scale Feature Fusion for Compressed Video Super Resolution

Attention-guided dual spatial-temporal non-local network for video super-resolution

Video super-resolution via mixed spatial-temporal convolution and selective fusion

Multi-Memory Convolutional Neural Network for Video Super-Resolution.

How Video Super-Resolution and Frame Interpolation Mutually Benefit

Multi-Source Deep Residual Fusion Network for Depth Image Super-resolution

Multi-Branch Networks for Video Super-Resolution with Dynamic Reconstruction Strategy.

Frame and Feature-Context Video Super-Resolution

Position-Guided Multi-Head Alignment and Fusion for Video Super-Resolution

Fast Spatio-Temporal Residual Network For Video Super-Resolution

A Progressive Fusion Generative Adversarial Network for Realistic and Consistent Video Super-Resolution

FM-VSR: Feature Multiplexing Video Super-Resolution for Compressed Video