Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes

Abhinav Shrivastava,Gaurav Shrivastava
DOI: https://doi.org/10.1109/CVPR52733.2024.00691
2024-06-16
Computer Vision and Pattern Recognition
Abstract:Diffusion models have made significant strides in image generation, mastering tasks such as unconditional image synthesis, text-image translation, and image-to-image conversions. However, their capability falls short in the realm of video prediction, mainly because they treat videos as a collection of independent images, relying on external constraints such as temporal attention mechanisms to enforce temporal coherence. In our paper, we introduce a novel model class, that treats video as a continuous multi-dimensional process rather than a series of discrete frames. Through extensive ex-perimentation, we establish state-of-the-art performance in video prediction, validated on benchmark datasets including KTH, BAIR, Human3.6M, and UCF101 11Navigate to the webpage for video results.
Computer Science
What problem does this paper attempt to address?