Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

Cong Cao,Huanjing Yue,Xin Liu,Jingyu Yang

2024-07-02

Abstract:Diffusion-based zero-shot image restoration and enhancement models have achieved great success in various image restoration and enhancement tasks without training. However, directly applying them to video restoration and enhancement results in severe temporal flickering artifacts. In this paper, we propose the first framework for zero-shot video restoration and enhancement based on a pre-trained image diffusion model. By replacing the self-attention layer with the proposed cross-previous-frame attention layer, the pre-trained image diffusion model can take advantage of the temporal correlation between neighboring frames. We further propose temporal consistency guidance, spatial-temporal noise sharing, and an early stopping sampling strategy for better temporally consistent sampling. Our method is a plug-and-play module that can be inserted into any diffusion-based zero-shot image restoration or enhancement methods to further improve their performance. Experimental results demonstrate the superiority of our proposed method in producing temporally consistent videos with better fidelity.

Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the zero-shot tasks in video restoration and enhancement. Specifically, the authors propose a framework based on a pre-trained image diffusion model for zero-shot video restoration and enhancement. Existing zero-shot image restoration and enhancement methods based on diffusion models produce severe temporal flickering artifacts when applied to videos. To solve this problem, the authors propose the following key techniques: 1. **Cross-Previous-Frame Attention**: Enhances temporal consistency by utilizing information from the previous frame through replacing the self-attention layer. 2. **Temporal Consistency Guidance**: Guides the generation process by calculating optical flow and occlusion masks to maintain temporal consistency. 3. **Spatial-Temporal Noise Sharing**: Shares noise between different frames to reduce temporal flickering. 4. **Early Stopping Sampling Strategy**: Stops sampling early in the reverse diffusion process to avoid generating high-frequency noise in the later stages. These techniques work together to enable the pre-trained image diffusion model to maintain good temporal consistency in video restoration and enhancement tasks, thereby improving the fidelity of the videos. Experimental results show that this method performs excellently in generating temporally consistent videos, especially in tasks such as low-light video enhancement, video super-resolution, video restoration, and video colorization.

Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration

Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

DIVD: Deblurring with Improved Video Diffusion Model

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

Efficient and consistent zero-shot video generation with diffusion models

Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency

Fourier Priors-Guided Diffusion for Zero-Shot Joint Low-Light Enhancement and Deblurring

Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

High-Fidelity Diffusion Editor for Zero-Shot Text-Guided Video Editing

Fine-gained Zero-shot Video Sampling

PixRevive: Latent Feature Diffusion Model for Compressed Video Quality Enhancement

Efficient Diffusion Model for Image Restoration by Residual Shifting

LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation

Training-free Prior Guided Diffusion Model for Zero-Reference Low-Light Image Enhancement

Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation

A Recurrent Video Quality Enhancement Framework with Multi-Granularity Frame-Fusion and Frame Difference Based Attention

Video Diffusion Models are Strong Video Inpainter