Abstract:Video super-resolution (VSR), which takes advantage of multiple low-resolution (LR) video frames to reconstruct corresponding high-resolution (HR) frames in a video, has raised increasing interest. To upsample an LR frame (denoted by a reference frame), VSR methods usually align multiple neighboring frames (denoted by supporting frames) to the reference frame first in order to provide more relevant information. The existing VSR methods usually employ deformable convolution to conduct the frame alignment, where the whole supporting frame is aligned to the reference frame without a specific target and without supervision. Thus, the aligned features are not explicitly learned to provide the HR frame information and cannot fully explore the supporting frames. To address this problem, in this work, we propose a novel video super-resolution framework with Position-Guided Multi-Head Alignment, termed as PGMH-A, to explicitly align the supporting frames to different spatial positions of the HR frame (denoted by different heads). It injects explicit position information to obtain multi-head-aligned features of supporting frames to better formulate the HR frame. PGMH-A can be trained individually or end-to-end with the ground-truth HR frames. Moreover, a Position-Guided Multi-Head Fusion, termed as PGMH-F, is developed based on the attention mechanism to further fuse the spatial–temporal information across temporal supporting frames, across multiple heads corresponding to the different spatial positions of an HR frame, and across multiple channels. Together, the proposed Position-Guided Multi-Head Alignment and Fusion (PGMH-AF) can provide VSR with better local details and temporal coherence. The experimental results demonstrate that the proposed method outperforms the state-of-the-art VSR networks. Ablation studies have also been conducted to verify the effectiveness of the proposed modules.

Video Super-Resolution with Pyramid Flow-Guided Deformable Alignment Network

Video super-resolution with phase-aided deformable alignment network

Deep Video Super-Resolution with Flow-Guided Deformable Alignment and Sparsity-based Temporal-Spatial Enhancement

DVSRNet: Deep Video Super-Resolution Based on Progressive Deformable Alignment and Temporal-Sparse Enhancement

Video Super-Resolution Via Nonlocal Deformable Alignment and Frame Recursive Progressive Fusion Network

FGBRSN: Flow-Guided Gated Bi-Directional Recurrent Separated Network for Video Super-Resolution.

Grouped Spatio-Temporal Alignment Network for Video Super-Resolution

Space-time Super-Resolution with Motion-Perceptive Deformable Alignment

Video super-resolution with fused local and nonlocal feature

Video Super-Resolution Via a Spatio-Temporal Alignment Network.

DFVSR: Directional Frequency Video Super-Resolution Via Asymmetric and Enhancement Alignment Network

FM-VSR: Feature Multiplexing Video Super-Resolution for Compressed Video

Spatio-Temporal Fusion Network for Video Super-Resolution

Position-Guided Multi-Head Alignment and Fusion for Video Super-Resolution

Video Super-Resolution with Recurrent High and Low-Frequency Information Propagation

Enhanced Bidirectional Propagation Network for Video Super-Resolution

Deep Plug-and-Play Video Super-Resolution.

Deep Video Super-Resolution Using HR Optical Flow Estimation.

Real-World Video Super-Resolution with a Degradation-Adaptive Model

Multi-Stage Feature Fusion Network for Video Super-Resolution

You Only Align Once: Bidirectional Interaction for Spatial-Temporal Video Super-Resolution