Abstract:Video super-resolution aims to recover the high-resolution (HR) contents from the low-resolution (LR) observations relying on compositing the spatial-temporal information in the LR frames. It is crucial to model the spatial-temporal information jointly since the video sequences are three-dimensional spatial-temporal signals. Compared with explicitly estimating motions between the 2D frames, 3D convolutional neural networks (CNNs) have been shown its efficiency and effectiveness for video super-resolution (SR), as a natural way of spatial-temporal data modelling. Though promising, the performance of 3D CNNs is still far from satisfactory. The high computational and memory requirements limit the development of more advanced designs to extract and fuse the information from a larger spatial and temporal scale. We thus propose a Mixed Spatial-Temporal Convolution (MSTC) block that simultaneously extracts the spatial information and the supplemented temporal dependency among frames by jointly applying 2D and 3D convolution. To further fuse the learned features corresponding to different frames, we propose a novel similarity-based selective features strategy, unlike precious methods directly stacking the learned features. Additionally, an attention-based motion compensation module is applied to alleviate the influence of misalignment between frames. Experiments on three widely used benchmark datasets and real-world dataset show that, relying on superior feature extraction and fusion ability, the proposed network can outperform previous state-of-the-art methods, especially for recovering the confusing details.

3D Video Super-Resolution Using Fully Convolutional Neural Networks

Lightweight Image Super-Resolution Network Using 3D Convolutional Neural Networks

Structure-preserving video super-resolution using three-dimensional convolutional neural networks

Video super-resolution via mixed spatial-temporal convolution and selective fusion

Multi-Memory Convolutional Neural Network for Video Super-Resolution.

Video super-resolution with 3D adaptive normalized convolution

Hyperspectral Image Spatial Super-Resolution Via 3d Full Convolutional Neural Network

Video super-resolution via dense non-local spatial-temporal convolutional network

Video Super-resolution Reconstruction Using Deep and Shallow Convolutional Neural Networks.

Fast Spatio-Temporal Residual Network For Video Super-Resolution

AsConvSR: Fast and Lightweight Super-Resolution Network with Assembled Convolutions

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

Iterative Back Projection Network Based on Deformable 3D Convolution

Low-Rank Constrained Super-Resolution for Mixed-Resolution Multiview Video.

Super-multiview Integral Imaging Scheme Based on Sparse Camera Array and CNN Super-Resolution.

Stereoscopic Image Super-Resolution Method with View Incorporation and Convolutional Neural Networks

Deformable 3D Convolution for Video Super-Resolution

Video Super-Resolution Via a Spatio-Temporal Alignment Network.

Video Super-Resolution Via Residual Learning.

Multi-Temporal Ultra Dense Memory Network for Video Super-Resolution.

Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution