Abstract:By converting low-frame-rate, low-resolution videos into high-frame-rate, high-resolution ones, space-time video super-resolution techniques can enhance visual experiences and facilitate more efficient information dissemination. We propose a convolutional neural network (CNN) for space-time video super-resolution, namely GIRNet. To generate highly accurate features and thus improve performance, the proposed network integrates a feature-level temporal interpolation module with deformable convolutions and a global spatial-temporal information-based residual convolutional long short-term memory (convLSTM) module. In the feature-level temporal interpolation module, we leverage deformable convolution, which adapts to deformations and scale variations of objects across different scene locations. This presents a more efficient solution than conventional convolution for extracting features from moving objects. Our network effectively uses forward and backward feature information to determine inter-frame offsets, leading to the direct generation of interpolated frame features. In the global spatial-temporal information-based residual convLSTM module, the first convLSTM is used to derive global spatial-temporal information from the input features, and the second convLSTM uses the previously computed global spatial-temporal information feature as its initial cell state. This second convLSTM adopts residual connections to preserve spatial information, thereby enhancing the output features. Experiments on the Vimeo90K dataset show that the proposed method outperforms state-of-the-art techniques in peak signal-to-noise-ratio (by 1.45 dB, 1.14 dB, and 0.02 dB over STARnet, TMNet, and 3DAttGAN, respectively), structural similarity index(by 0.027, 0.023, and 0.006 over STARnet, TMNet, and 3DAttGAN, respectively), and visually.

Deformable 3D Convolution for Video Super-Resolution

Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement

Video super-resolution with phase-aided deformable alignment network

Fast Spatio-Temporal Residual Network For Video Super-Resolution

Iterative Back Projection Network Based on Deformable 3D Convolution

Lightweight Image Super-Resolution Network Using 3D Convolutional Neural Networks

Video super-resolution with 3D adaptive normalized convolution

Video super-resolution via mixed spatial-temporal convolution and selective fusion

Structure-preserving video super-resolution using three-dimensional convolutional neural networks

Video Super-Resolution Via a Spatio-Temporal Alignment Network.

Deformable Kernel Convolutional Network for Video Extreme Super-Resolution

Video super-resolution via dense non-local spatial-temporal convolutional network

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

AsConvSR: Fast and Lightweight Super-Resolution Network with Assembled Convolutions

Enhanced Video Super-Resolution Network Towards Compressed Data

3DAttGAN: A 3D Attention-based Generative Adversarial Network for Joint Space-Time Video Super-Resolution

Video Super-Resolution Reconstruction Based on Deep Convolutional Neural Network and Spatio-Temporal Similarity

Video Super-Resolution Based on Spatial-Temporal Recurrent Residual Networks

DVSRNet: Deep Video Super-Resolution Based on Progressive Deformable Alignment and Temporal-Sparse Enhancement

Global Spatial-Temporal Information-based Residual ConvLSTM for Video Space-Time Super-Resolution

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting