Trajectory Attention for Fine-grained Video Motion Control

Zeqi Xiao,Wenqi Ouyang,Yifan Zhou,Shuai Yang,Lei Yang,Jianlou Si,Xingang Pan
2024-11-29
Abstract:Recent advancements in video generation have been greatly driven by video diffusion models, with camera motion control emerging as a crucial challenge in creating view-customized visual content. This paper introduces trajectory attention, a novel approach that performs attention along available pixel trajectories for fine-grained camera motion control. Unlike existing methods that often yield imprecise outputs or neglect temporal correlations, our approach possesses a stronger inductive bias that seamlessly injects trajectory information into the video generation process. Importantly, our approach models trajectory attention as an auxiliary branch alongside traditional temporal attention. This design enables the original temporal attention and the trajectory attention to work in synergy, ensuring both precise motion control and new content generation capability, which is critical when the trajectory is only partially available. Experiments on camera motion control for images and videos demonstrate significant improvements in precision and long-range consistency while maintaining high-quality generation. Furthermore, we show that our approach can be extended to other video motion control tasks, such as first-frame-guided video editing, where it excels in maintaining content consistency over large spatial and temporal ranges.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of **fine - grained camera motion control** in video generation. Specifically, the authors focus on how to achieve precise and consistent camera motion control when generating videos, thereby creating customized perspective content. Although current video generation models (such as video diffusion models) have made significant progress in synthesizing realistic videos, they still face challenges when dealing with camera motion control, mainly reflected in: 1. **Imprecise output**: Existing methods tend to produce blurry or inaccurate results. 2. **Ignoring temporal correlation**: Many methods overlook the temporal relationship between frames, resulting in a lack of consistency in the generated videos over a long time range. To solve these problems, the authors propose a new method called **Trajectory Attention**. This method seamlessly injects trajectory information into the video generation process by performing an attention mechanism along the available pixel trajectories, thereby achieving finer - grained camera motion control. Different from traditional methods that only rely on temporal attention, Trajectory Attention works in concert with traditional temporal attention as an auxiliary branch, ensuring the precision of motion control and the ability to generate new content, especially when the trajectory part is available. ### Specific improvements of Trajectory Attention - **Stronger inductive bias**: Trajectory Attention utilizes 3D consistency constraints, making feature alignment along the trajectory more natural and precise. - **Long - range consistency**: Compared with traditional temporal attention, Trajectory Attention can ensure the coherence of features over long distances, thereby improving the quality and consistency of the generated videos. - **Flexibility**: This method is not only applicable to camera motion control but can also be extended to other video motion control tasks, such as video editing guided by the first frame, showing excellent content consistency. ### Experimental results Experiments show that Trajectory Attention significantly improves the precision and long - range consistency in camera motion control tasks for images and videos while maintaining high - quality generation effects. In addition, this method also performs well in other video motion control tasks, for example, maintaining content consistency over a large range of space and time. In general, this paper solves the deficiencies of existing video generation models in camera motion control by introducing the Trajectory Attention mechanism, providing a more precise and consistent solution.