Abstract:Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing this challenge necessitates the integration of conditional information, encompassing temporal features, low-frequency texture, and high-frequency events, to guide the Denoising Diffusion Probabilistic Model (DDPM) in producing accurate and natural outputs. To tackle this issue, we introduce a novel approach, the Temporal Residual Guided Diffusion Framework, which effectively leverages both temporal and frequency-based event priors. Our framework incorporates three key conditioning modules: a pre-trained low-frequency intensity estimation module, a temporal recurrent encoder module, and an attention-based high-frequency prior enhancement module. In order to capture temporal scene variations from the events at the current moment, we employ a temporal-domain residual image as the target for the diffusion model. Through the combination of these three conditioning paths and the temporal residual framework, our framework excels in reconstructing high-quality videos from event flow, mitigating issues such as artifacts and over-smoothing commonly observed in previous approaches. Extensive experiments conducted on multiple benchmark datasets validate the superior performance of our framework compared to prior event-based reconstruction methods.

TempDiff: Enhancing Temporal‐awareness in Latent Diffusion for Real‐World Video Super‐Resolution

Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach

Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models

Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

TASR: Timestep-Aware Diffusion Model for Image Super-Resolution

Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model

Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach

Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation

PixRevive: Latent Feature Diffusion Model for Compressed Video Quality Enhancement

DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation

MV-Diffusion: Motion-aware Video Diffusion Model

Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

Blended Latent Diffusion under Attention Control for Real-World Video Editing

JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation