Generative Image Dynamics

Zhengqi Li,Richard Tucker,Noah Snavely,Aleksander Holynski
2024-05-15
Abstract:We present an approach to modeling an image-space prior on scene motion. Our prior is learned from a collection of motion trajectories extracted from real video sequences depicting natural, oscillatory dynamics such as trees, flowers, candles, and clothes swaying in the wind. We model this dense, long-term motion prior in the Fourier domain:given a single image, our trained model uses a frequency-coordinated diffusion sampling process to predict a spectral volume, which can be converted into a motion texture that spans an entire video. Along with an image-based rendering module, these trajectories can be used for a number of downstream applications, such as turning still images into seamlessly looping videos, or allowing users to realistically interact with objects in real pictures by interpreting the spectral volumes as image-space modal bases, which approximate object dynamics.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### The Problems This Paper Attempts to Solve This paper primarily attempts to address the following issues: 1. **Generating Dynamic Videos from a Single Image**: - Utilizing generative models to predict a spectral volume, thereby transforming a static image into a looping video with natural oscillatory dynamic effects. - A spectral volume is a method of representing dense, long-range pixel trajectories in the Fourier domain, suitable for simulating the motion of naturally oscillating objects like trees and flowers. 2. **Achieving Interactive Dynamic Simulation**: - Interpreting the spectral volume as an image-space modal basis to simulate the response of objects when forces are applied by the user, such as dragging or releasing an object. 3. **Improving the Realism and Coherence of Animations**: - Enhancing long-term generation consistency and control precision by generating intermediate motion representations (such as spectral volumes) rather than directly generating raw video frames. 4. **Overcoming Deficiencies in Existing Methods**: - Addressing issues in traditional methods such as incoherent motion, unrealistic texture changes, and violations of physical constraints when generating videos. Through these methods, the paper aims to achieve more realistic and coherent dynamic effects, making videos generated from a single image appear more natural.