Abstract:Omnidirectional video (ODV) can provide an immersive experience and is widely utilized in the field of virtual reality and augmented reality. However, the restricted capturing devices and transmission bandwidth lead to the low resolution of ODVs. Video super-resolution (VSR) methods are proposed to enhance the resolution of videos, but ODV projection distortions in the application are not well addressed directly applying such methods. To achieve better super-resolution reconstruction quality, we propose a novel Spatio-Temporal Distortion Aware Network (STDAN) oriented to ODV characteristics. Specifically, a spatio-temporal distortion modulation module is introduced to improve spatial ODV projection distortions and exploit the temporal correlation according to intra and inter alignments. Next, we design a multi-frame reconstruction and fusion mechanism to refine the consistency of reconstructed ODV frames. Furthermore, we incorporate latitude-saliency adaptive maps in the loss function to concentrate on important viewpoint regions with higher texture complexity and human-watching interest. In addition, we collect a new ODV-SR dataset with various scenarios. Extensive experimental results demonstrate that the proposed STDAN achieves superior super-resolution performance on ODVs and outperforms state-of-the-art methods.

What problem does this paper attempt to address?

The problem this paper attempts to address is the low resolution of panoramic videos (i.e., omnidirectional videos, ODV) in virtual reality and augmented reality applications. Due to limitations in capture devices and transmission bandwidth, omnidirectional videos typically have lower resolution, which affects the user's visual experience. Traditional video super-resolution (VSR) methods, although capable of enhancing the resolution of regular videos, cannot effectively handle projection distortion when directly applied to omnidirectional videos, resulting in poor reconstruction quality. To overcome these issues, the paper proposes a novel Spatio-Temporal Distortion-Aware Network (STDAN) specifically designed for the characteristics of omnidirectional videos. Specifically, the main contributions of the paper include: 1. **Spatio-Temporal Distortion-Aware Super-Resolution Network (STDAN)**: This network introduces a spatio-temporal distortion modulation module to jointly compensate for spatio-temporal distortions, fully utilizing the spatial projection distortions and temporal motion similarities of omnidirectional videos. 2. **Multi-Frame Reconstruction and Fusion Module**: Based on multiple frames at different time stages, a joint multi-frame reconstruction and fusion module is designed to ensure the temporal consistency and smoothness of omnidirectional videos. 3. **Latitude-Saliency Adaptive Loss Function**: By calculating latitude-related distortion maps and saliency-guided attention maps, the loss function is adjusted so that the model can focus on important salient regions, optimizing perceptual quality. Through these innovations, the paper aims to improve the super-resolution performance of omnidirectional videos, making them superior to existing state-of-the-art methods in both objective metrics and subjective experience.

Spatio-Temporal Distortion Aware Omnidirectional Video Super-Resolution

Video super-resolution with phase-aided deformable alignment network

STDAN: Deformable Attention Network for Space-Time Video Super-Resolution

Stereo Video Super-Resolution Via Exploiting View-Temporal Correlations.

Omniscient Video Super-Resolution with Explicit-Implicit Alignment

TDAN: Temporally Deformable Alignment Network for Video Super-Resolution

Real-World Video Super-Resolution with a Degradation-Adaptive Model

SAVSR: Arbitrary-Scale Video Super-Resolution Via a Learned Scale-Adaptive Network

OSRT: Omnidirectional Image Super-Resolution with Distortion-aware Transformer

Omnidirectional Image Super-resolution Via Bi-projection Fusion

Deformable Kernel Convolutional Network for Video Extreme Super-Resolution

Video Super-Resolution Via a Spatio-Temporal Alignment Network.

ODVista: An Omnidirectional Video Dataset for super-resolution and Quality Enhancement Tasks

Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution

Super-resolution of Omnidirectional Images Using Adversarial Learning

DVSRNet: Deep Video Super-Resolution Based on Progressive Deformable Alignment and Temporal-Sparse Enhancement

Spatio-Temporal Fusion Network for Video Super-Resolution

Video Super-Resolution Via Nonlocal Deformable Alignment and Frame Recursive Progressive Fusion Network

Boosting Video Super Resolution with Patch-Based Temporal Redundancy Optimization

Omnidirectional image super-resolution via position attention network

Video Super-Resolution Reconstruction Based on Deep Learning and Spatio-Temporal Feature Self-similarity