3DAttGAN: A 3D Attention-based Generative Adversarial Network for Joint Space-Time Video Super-Resolution

Congrui Fu,Hui Yuan,Liquan Shen,Raouf Hamzaoui,Hao Zhang

2024-07-24

Abstract:In many applications, including surveillance, entertainment, and restoration, there is a need to increase both the spatial resolution and the frame rate of a video sequence. The aim is to improve visual quality, refine details, and create a more realistic viewing experience. Existing space-time video super-resolution methods do not effectively use spatio-temporal information. To address this limitation, we propose a generative adversarial network for joint space-time video super-resolution. The generative network consists of three operations: shallow feature extraction, deep feature extraction, and reconstruction. It uses three-dimensional (3D) convolutions to process temporal and spatial information simultaneously and includes a novel 3D attention mechanism to extract the most important channel and spatial information. The discriminative network uses a two-branch structure to handle details and motion information, making the generated results more accurate. Experimental results on the Vid4, Vimeo-90K, and REDS datasets demonstrate the effectiveness of the proposed method. The source code is publicly available at <a class="link-external link-https" href="https://github.com/FCongRui/3DAttGan.git" rel="external noopener nofollow">this https URL</a>.

Image and Video Processing

What problem does this paper attempt to address?

The paper aims to address the problem of video spatio-temporal super-resolution (STSR). Specifically, the authors propose a novel Generative Adversarial Network (GAN), namely the 3D Attention Mechanism Generative Adversarial Network (3DAttGAN), to simultaneously enhance the spatial resolution and frame rate of videos. Existing STSR methods typically perform spatial super-resolution (SSR) and temporal super-resolution (TSR) independently, which is not only inefficient but also fails to fully utilize spatio-temporal information. To address this issue, 3DAttGAN improves existing methods in the following ways: 1. **3D Convolution Operations**: Processes temporal and spatial information simultaneously, rather than separately. 2. **3D Attention Mechanism**: Extends the traditional 2D attention mechanism to be applicable to 3D convolutional networks, thereby better extracting important spatio-temporal features. 3. **Dual-Branch Discriminator**: One branch evaluates the detailed features of video frames, while the other branch assesses the motion information between frames to improve the accuracy of the generated results. Experimental results show that on the Vid4, Vimeo-90K, and REDS datasets, this method performs excellently in texture-rich and high-motion scenes, and outperforms existing STSR methods.

3DAttGAN: A 3D Attention-based Generative Adversarial Network for Joint Space-Time Video Super-Resolution

Video super-resolution with phase-aided deformable alignment network

Global Spatial-Temporal Information-based Residual ConvLSTM for Video Space-Time Super-Resolution

High-order relational generative adversarial network for video super-resolution

Single Remote Sensing Image Super-Resolution Via a Generative Adversarial Network with Stratified Dense Sampling and Chain Training

Learning for Unconstrained Space-Time Video Super-Resolution

Video super-resolution via mixed spatial-temporal convolution and selective fusion

Video Super-Resolution With Temporal Group Attention

Video Super-Resolution Via a Spatio-Temporal Alignment Network.

A multiresolution mixture generative adversarial network for video super-resolution

VideoGigaGAN: Towards Detail-rich Video Super-Resolution

Cuboid-Net: A Multi-Branch Convolutional Neural Network for Joint Space-Time Video Super Resolution

Deformable 3D Convolution for Video Super-Resolution

Video super-resolution with 3D adaptive normalized convolution

A Lightweight Recurrent Grouping Attention Network for Video Super-Resolution

FRAGAN-VSR - Frame-Recurrent Attention Generative Adversarial Network for Video Super-Resolution.

SuperGS: Super-Resolution 3D Gaussian Splatting Enhanced by Variational Residual Features and Uncertainty-Augmented Learning

STDAN: Deformable Attention Network for Space-Time Video Super-Resolution

Generative Adversarial Networks and Perceptual Losses for Video Super-Resolution

VEnhancer: Generative Space-Time Enhancement for Video Generation

Improving Generative Adversarial Networks for Video Super-Resolution