3DAttGAN: A 3D Attention-based Generative Adversarial Network for Joint Space-Time Video Super-Resolution

Congrui Fu,Hui Yuan,Liquan Shen,Raouf Hamzaoui,Hao Zhang
2024-07-24
Abstract:In many applications, including surveillance, entertainment, and restoration, there is a need to increase both the spatial resolution and the frame rate of a video sequence. The aim is to improve visual quality, refine details, and create a more realistic viewing experience. Existing space-time video super-resolution methods do not effectively use spatio-temporal information. To address this limitation, we propose a generative adversarial network for joint space-time video super-resolution. The generative network consists of three operations: shallow feature extraction, deep feature extraction, and reconstruction. It uses three-dimensional (3D) convolutions to process temporal and spatial information simultaneously and includes a novel 3D attention mechanism to extract the most important channel and spatial information. The discriminative network uses a two-branch structure to handle details and motion information, making the generated results more accurate. Experimental results on the Vid4, Vimeo-90K, and REDS datasets demonstrate the effectiveness of the proposed method. The source code is publicly available at <a class="link-external link-https" href="https://github.com/FCongRui/3DAttGan.git" rel="external noopener nofollow">this https URL</a>.
Image and Video Processing
What problem does this paper attempt to address?
The paper aims to address the problem of video spatio-temporal super-resolution (STSR). Specifically, the authors propose a novel Generative Adversarial Network (GAN), namely the 3D Attention Mechanism Generative Adversarial Network (3DAttGAN), to simultaneously enhance the spatial resolution and frame rate of videos. Existing STSR methods typically perform spatial super-resolution (SSR) and temporal super-resolution (TSR) independently, which is not only inefficient but also fails to fully utilize spatio-temporal information. To address this issue, 3DAttGAN improves existing methods in the following ways: 1. **3D Convolution Operations**: Processes temporal and spatial information simultaneously, rather than separately. 2. **3D Attention Mechanism**: Extends the traditional 2D attention mechanism to be applicable to 3D convolutional networks, thereby better extracting important spatio-temporal features. 3. **Dual-Branch Discriminator**: One branch evaluates the detailed features of video frames, while the other branch assesses the motion information between frames to improve the accuracy of the generated results. Experimental results show that on the Vid4, Vimeo-90K, and REDS datasets, this method performs excellently in texture-rich and high-motion scenes, and outperforms existing STSR methods.