Abstract:Current video deblurring methods have limitations in recovering high-frequency information since the regression losses are conservative with high-frequency details. Since Diffusion Models (DMs) have strong capabilities in generating high-frequency details, we consider introducing DMs into the video deblurring task. However, we found that directly applying DMs to the video deblurring task has the following problems: (1) DMs require many iteration steps to generate videos from Gaussian noise, which consumes many computational resources. (2) DMs are easily misled by the blurry artifacts in the video, resulting in irrational content and distortion of the deblurred video. To address the above issues, we propose a novel video deblurring framework VD-Diff that integrates the diffusion model into the Wavelet-Aware Dynamic Transformer (WADT). Specifically, we perform the diffusion model in a highly compact latent space to generate prior features containing high-frequency information that conforms to the ground truth distribution. We design the WADT to preserve and recover the low-frequency information in the video while utilizing the high-frequency information generated by the diffusion model. Extensive experiments show that our proposed VD-Diff outperforms SOTA methods on GoPro, DVD, BSD, and Real-World Video datasets.

What problem does this paper attempt to address?

This paper attempts to solve the problem of high - frequency information restoration in video deblurring. Current video deblurring methods have limitations in restoring high - frequency information because the regression loss is relatively conservative for high - frequency details. Diffusion Models (DMs) show strong capabilities in generating high - frequency details, so the author considers introducing them into the video deblurring task. However, there are two main problems in directly applying the diffusion model to the video deblurring task: 1. **High consumption of computational resources**: The diffusion model requires a large number of iterative steps to generate videos from Gaussian noise, which consumes a large amount of computational resources. 2. **Easily misled by blurring artifacts**: The diffusion model is easily misled by motion - blurring artifacts in videos, resulting in unreasonable or distorted video content after deblurring. To solve the above problems, the author proposes a new video deblurring framework VD - Diff, which combines the diffusion model with Wavelet - Aware Dynamic Transformer (WADT). Specifically, they perform the diffusion model in a highly compressed latent space to generate prior features that conform to the real distribution and contain high - frequency information. At the same time, WADT is designed to preserve and restore the low - frequency information in videos and utilize the high - frequency information generated by the diffusion model. Experimental results show that the proposed VD - Diff method outperforms the existing state - of - the - art methods on multiple datasets such as GoPro, DVD, BSD, and Real - World Video. ### Formula Summary - **Formula 1**: Calculation of the highly compact latent prior feature \(z'\) for guiding video restoration: \[ \hat{F}=W_{1}^{l}z'\odot\text{LN}(F_{aa}) + W_{2}^{l}z' \] where \(\odot\) represents element - wise multiplication, \(\text{LN}\) represents layer normalization, and \(W_{l}\) represents a linear layer. - **Formula 2**: Attention calculation formula: \[ F'_{\text{out}}=W_{a}\hat{V}\cdot\text{Softmax}\left(\frac{\hat{K}\cdot\hat{Q}}{\gamma}\right)+F_{aa} \] where \(\gamma\) is a learnable scaling parameter. - **Formula 3**: Overall process of the WDA - FFN part: \[ F_{\text{out}}=G(W_{1}^{2}W_{1}^{3}\hat{F}'_{\text{out}})\odot W_{2}^{2}W_{2}^{3}\hat{F}'_{\text{out}}+\hat{F}'_{\text{out}} \] where \(G\) represents the Gaussian Error Linear Unit (GELU). - **Formula 4**: Gaussian noise conversion in the forward diffusion process: \[ q(z_{t}|z_{t - 1})=\mathcal{N}(z_{t};\sqrt{1-\beta_{t}}z_{t - 1},\beta_{t}I) \] - **Formula 5**: Denoising formula in the reverse denoising process: \[ z_{t - 1}=\frac{1}{\sqrt{\alpha_{t}}}\left(z_{t}-\epsilon\sqrt{1-\bar{\alpha}_{t}}\right) \] Through these methods and formulas, VD - Diff effectively solves the problem of high - frequency information restoration in video deblurring and significantly improves the quality of deblurred videos.

Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model

DIVD: Deblurring with Improved Video Diffusion Model

Efficiently Exploiting Spatially Variant Knowledge for Video Deblurring

Bidirectional Transformer for Video Deblurring

VDTR: Video Deblurring with Transformer

PixRevive: Latent Feature Diffusion Model for Compressed Video Quality Enhancement

Hierarchical Integration Diffusion Model for Realistic Image Deblurring

Fast Ultra High-Definition Video Deblurring via Multi-scale Separable Network

Towards Real-World Video Deblurring by Exploring Blur Formation Process

TempDiff: Enhancing Temporal‐awareness in Latent Diffusion for Real‐World Video Super‐Resolution

Flow-Guided Sparse Transformer for Video Deblurring

Domain-adaptive Video Deblurring via Test-time Blurring

VDPI: Video Deblurring with Pseudo-inverse Modeling

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring

Solving Video Inverse Problems Using Image Diffusion Models

Adaptive wavelet transform defogging scheme for real-time video restoration with field programmable gate array implementation

WaveDM: Wavelet-Based Diffusion Models for Image Restoration

Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model