Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

Qinyu Yang,Haoxin Chen,Yong Zhang,Menghan Xia,Xiaodong Cun,Zhixun Su,Ying Shan
2024-07-15
Abstract:In order to improve the quality of synthesized videos, currently, one predominant method involves retraining an expert diffusion model and then implementing a noising-denoising process for refinement. Despite the significant training costs, maintaining consistency of content between the original and enhanced videos remains a major challenge. To tackle this challenge, we propose a novel formulation that considers both visual quality and consistency of content. Consistency of content is ensured by a proposed loss function that maintains the structure of the input, while visual quality is improved by utilizing the denoising process of pretrained diffusion models. To address the formulated optimization problem, we have developed a plug-and-play noise optimization strategy, referred to as Noise Calibration. By refining the initial random noise through a few iterations, the content of original video can be largely preserved, and the enhancement effect demonstrates a notable improvement. Extensive experiments have demonstrated the effectiveness of the proposed method.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper mainly explores how to enhance the video while maintaining the consistency of the content and improving the visual quality when using diffusion models. Existing methods often require retraining expert models for denoising and denoising operations, which may destroy the content structure of the original video. To address this, the paper proposes a new optimization formula called "Noise Calibration" which gradually eliminates the introduced noise through a pre-trained video diffusion model in order to preserve the content of the original video while enhancing its quality. The noise calibration strategy only needs to correct the initial random noise without additional fine-tuning or operations. It significantly improves the enhancement effect after 1-3 iterations, thus preserving the content of the original video to a large extent and improving the enhancement effect. Experimental results show that this method can effectively maintain content consistency and improve video quality when using a pre-trained text-to-video (T2V) diffusion model for video enhancement. The paper also compares with other related works, such as diffusion model-based video generation, super-resolution, and video refinement models, emphasizing the importance of content consistency in video enhancement. The effectiveness of the noise calibration method is demonstrated through quantitative and qualitative evaluations, as well as user studies. It can serve as a plugin for existing visual refinement models to further improve their performance.