Abstract:Video super-resolution aims to recover the high-resolution (HR) contents from the low-resolution (LR) observations relying on compositing the spatial-temporal information in the LR frames. It is crucial to model the spatial-temporal information jointly since the video sequences are three-dimensional spatial-temporal signals. Compared with explicitly estimating motions between the 2D frames, 3D convolutional neural networks (CNNs) have been shown its efficiency and effectiveness for video super-resolution (SR), as a natural way of spatial-temporal data modelling. Though promising, the performance of 3D CNNs is still far from satisfactory. The high computational and memory requirements limit the development of more advanced designs to extract and fuse the information from a larger spatial and temporal scale. We thus propose a Mixed Spatial-Temporal Convolution (MSTC) block that simultaneously extracts the spatial information and the supplemented temporal dependency among frames by jointly applying 2D and 3D convolution. To further fuse the learned features corresponding to different frames, we propose a novel similarity-based selective features strategy, unlike precious methods directly stacking the learned features. Additionally, an attention-based motion compensation module is applied to alleviate the influence of misalignment between frames. Experiments on three widely used benchmark datasets and real-world dataset show that, relying on superior feature extraction and fusion ability, the proposed network can outperform previous state-of-the-art methods, especially for recovering the confusing details.

Temporal3D: 2D-to-3d Video Conversion Network with Multi-frame Fusion

Temporal Consistent Object Pose Estimation from Monocular Videos

Recurrent Volume-Based 3-D Feature Fusion for Real-Time Multiview Object Pose Estimation.

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

Fused Network for View Synthesis.

Temporal Feature Fusion for 3D Detection in Monocular Video

Temporally Consistent Depth Map Estimation Based On 3d-Mrf

A Temporally Streamlined Optimization Method for Stereo Video Correspondence

A Novel 2D-to-3D Video Conversion Method Using Time-Coherent Depth Maps

View Synthesis of Dynamic Scenes Based on Deep 3D Mask Volume

AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition

Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks

Visual Pertinent 2D-to-3d Video Conversion by Multi-Cue Fusion

A Novel Method For Automatic 2d-To-3d Video Conversion

Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception

Coherent3D: Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Video super-resolution via mixed spatial-temporal convolution and selective fusion

Exploiting Temporal Consistency for Real-Time Video Depth Estimation

Video Super-Resolution With Temporal Group Attention

Multi-view 3D Reconstruction from Video with Transformer.