Abstract:Video super-resolution (VSR), which takes advantage of multiple low-resolution (LR) video frames to reconstruct corresponding high-resolution (HR) frames in a video, has raised increasing interest. To upsample an LR frame (denoted by a reference frame), VSR methods usually align multiple neighboring frames (denoted by supporting frames) to the reference frame first in order to provide more relevant information. The existing VSR methods usually employ deformable convolution to conduct the frame alignment, where the whole supporting frame is aligned to the reference frame without a specific target and without supervision. Thus, the aligned features are not explicitly learned to provide the HR frame information and cannot fully explore the supporting frames. To address this problem, in this work, we propose a novel video super-resolution framework with Position-Guided Multi-Head Alignment, termed as PGMH-A, to explicitly align the supporting frames to different spatial positions of the HR frame (denoted by different heads). It injects explicit position information to obtain multi-head-aligned features of supporting frames to better formulate the HR frame. PGMH-A can be trained individually or end-to-end with the ground-truth HR frames. Moreover, a Position-Guided Multi-Head Fusion, termed as PGMH-F, is developed based on the attention mechanism to further fuse the spatial–temporal information across temporal supporting frames, across multiple heads corresponding to the different spatial positions of an HR frame, and across multiple channels. Together, the proposed Position-Guided Multi-Head Alignment and Fusion (PGMH-AF) can provide VSR with better local details and temporal coherence. The experimental results demonstrate that the proposed method outperforms the state-of-the-art VSR networks. Ablation studies have also been conducted to verify the effectiveness of the proposed modules.

DVSRNet: Deep Video Super-Resolution Based on Progressive Deformable Alignment and Temporal-Sparse Enhancement

Video super-resolution with phase-aided deformable alignment network

Video Super-Resolution Based on Multiple Networks Merging

Real-World Video Super-Resolution with a Degradation-Adaptive Model

Deep Networks with Detail Enhancement for Infrared Image Super-Resolution

TDAN: Temporally Deformable Alignment Network for Video Super-Resolution

Video Super-Resolution Via a Spatio-Temporal Alignment Network.

Enhanced Video Super-Resolution Network Towards Compressed Data

DFVSR: Directional Frequency Video Super-Resolution Via Asymmetric and Enhancement Alignment Network

Space-time Super-Resolution with Motion-Perceptive Deformable Alignment

Multi-Stage Feature Fusion Network for Video Super-Resolution

FM-VSR: Feature Multiplexing Video Super-Resolution for Compressed Video

Position-Guided Multi-Head Alignment and Fusion for Video Super-Resolution

Dual feature enhanced video super-resolution network based on low-light scenarios

Deep Compressed Video Super-Resolution With Guidance of Coding Priors

Multiframe Video Satellite Image Super-Resolution via Attention-Based Residual Learning

Learning Degradation-Robust Spatiotemporal Frequency-Transformer for Video Super-Resolution

STDAN: Deformable Attention Network for Space-Time Video Super-Resolution

Deformable Kernel Convolutional Network for Video Extreme Super-Resolution

You Only Align Once: Bidirectional Interaction for Spatial-Temporal Video Super-Resolution

Video Super-Resolution Reconstruction Based on Deep Learning and Spatio-Temporal Feature Self-similarity