Abstract:Optical-flow-based and kernel-based approaches have been extensively explored for temporal compensation in satellite Video Super-Resolution (VSR). However, these techniques are less generalized in large-scale or complex scenarios, especially in satellite videos. In this paper, we propose to exploit the well-defined temporal difference for efficient and effective temporal compensation. To fully utilize the local and global temporal information within frames, we systematically modeled the short-term and long-term temporal discrepancies since we observed that these discrepancies offer distinct and mutually complementary properties. Specifically, we devise a Short-term Temporal Difference Module (S-TDM) to extract local motion representations from RGB difference maps between adjacent frames, which yields more clues for accurate texture representation. To explore the global dependency in the entire frame sequence, a Long-term Temporal Difference Module (L-TDM) is proposed, where the differences between forward and backward segments are incorporated and activated to guide the modulation of the temporal feature, leading to a holistic global compensation. Moreover, we further propose a Difference Compensation Unit (DCU) to enrich the interaction between the spatial distribution of the target frame and temporal compensated results, which helps maintain spatial consistency while refining the features to avoid misalignment. Rigorous objective and subjective evaluations conducted across five mainstream video satellites demonstrate that our method performs favorably against state-of-the-art approaches. Code will be available at <a class="link-external link-https" href="https://github.com/XY-boy/LGTD" rel="external noopener nofollow">this https URL</a>

Video-to-Video Translation with Global Temporal Consistency.

Video ControlNet: Towards Temporally Consistent Synthetic-to-Real Video Translation Using Conditional Image Diffusion Models

Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation.

A Global Approach for Video Matching

Learning Blind Video Temporal Consistency

LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

Video Captioning Using Global-Local Representation

Blind Video Temporal Consistency via Deep Video Prior

How Video Super-Resolution and Frame Interpolation Mutually Benefit

Temporally consistent video colorization with deep feature propagation and self-regularization learning

Exemplar-based video colorization with long-term spatiotemporal dependency

Towards Interpretable Video Super-Resolution Via Alternating Optimization

Video Super-Resolution With Temporal Group Attention

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

Adaptive Compact Attention For Few-shot Video-to-video Translation

Video Diffusion Models with Local-Global Context Guidance

Local-Global Temporal Difference Learning for Satellite Video Super-Resolution

Consistent Video Instance Segmentation with Inter-Frame Recurrent Attention

Video Captioning With Temporal And Region Graph Convolution Network