Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

Qinyu Yang,Haoxin Chen,Yong Zhang,Menghan Xia,Xiaodong Cun,Zhixun Su,Ying Shan

2024-07-15

Abstract:In order to improve the quality of synthesized videos, currently, one predominant method involves retraining an expert diffusion model and then implementing a noising-denoising process for refinement. Despite the significant training costs, maintaining consistency of content between the original and enhanced videos remains a major challenge. To tackle this challenge, we propose a novel formulation that considers both visual quality and consistency of content. Consistency of content is ensured by a proposed loss function that maintains the structure of the input, while visual quality is improved by utilizing the denoising process of pretrained diffusion models. To address the formulated optimization problem, we have developed a plug-and-play noise optimization strategy, referred to as Noise Calibration. By refining the initial random noise through a few iterations, the content of original video can be largely preserved, and the enhancement effect demonstrates a notable improvement. Extensive experiments have demonstrated the effectiveness of the proposed method.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper mainly explores how to enhance the video while maintaining the consistency of the content and improving the visual quality when using diffusion models. Existing methods often require retraining expert models for denoising and denoising operations, which may destroy the content structure of the original video. To address this, the paper proposes a new optimization formula called "Noise Calibration" which gradually eliminates the introduced noise through a pre-trained video diffusion model in order to preserve the content of the original video while enhancing its quality. The noise calibration strategy only needs to correct the initial random noise without additional fine-tuning or operations. It significantly improves the enhancement effect after 1-3 iterations, thus preserving the content of the original video to a large extent and improving the enhancement effect. Experimental results show that this method can effectively maintain content consistency and improve video quality when using a pre-trained text-to-video (T2V) diffusion model for video enhancement. The paper also compares with other related works, such as diffusion model-based video generation, super-resolution, and video refinement models, emphasizing the importance of content consistency in video enhancement. The effectiveness of the noise calibration method is demonstrated through quantitative and qualitative evaluations, as well as user studies. It can serve as a plugin for existing visual refinement models to further improve their performance.

Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion

SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning

FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models

Efficiency-optimized Video Diffusion Models

A Quality Enhancement Framework with Noise Distribution Characteristics for High Efficiency Video Coding

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

A Diffusion Model Based Quality Enhancement Method for HEVC Compressed Video

PixRevive: Latent Feature Diffusion Model for Compressed Video Quality Enhancement

FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process

Practical Real Video Denoising with Realistic Degradation Model

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Estimating Fine-Grained Noise Model via Contrastive Learning

Learning Model-Blind Temporal Denoisers without Ground Truths

Learning Task-Oriented Flows to Mutually Guide Feature Alignment in Synthesized and Real Video Denoising

Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation

Improved Noise Schedule for Diffusion Training

Video ControlNet: Towards Temporally Consistent Synthetic-to-Real Video Translation Using Conditional Image Diffusion Models

Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency