Video Diffusion Models: A Survey

Andrew Melnik,Michal Ljubljanac,Cong Lu,Qi Yan,Weiming Ren,Helge Ritter
2024-05-06
Abstract:Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video. This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics. Recent advancements in the field are summarized and grouped into development trends. The survey concludes with an overview of remaining challenges and an outlook on the future of the field. Website:
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address several key challenges in video generation, particularly how to utilize Diffusion Models to generate coherent and high-quality videos. Specifically, the paper focuses on the following aspects: 1. **Temporal Consistency**: How to maintain temporal coherence when generating videos, ensuring smooth transitions between video frames. 2. **Long Video Generation**: How to generate longer video sequences, rather than just a few seconds of clips. 3. **Computational Cost**: How to reduce computational costs while ensuring the quality of generation, making the model efficient for practical applications. The paper systematically reviews the key elements of Diffusion Models in the field of video generation, including architecture selection, temporal dynamics modeling, and training modes. It summarizes recent research progress and points out current challenges and future development directions.