3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

Xiao Fu,Xian Liu,Xintao Wang,Sida Peng,Menghan Xia,Xiaoyu Shi,Ziyang Yuan,Pengfei Wan,Di Zhang,Dahua Lin

2024-12-11

Abstract:This paper aims to manipulate multi-entity 3D motions in video generation. Previous methods on controllable video generation primarily leverage 2D control signals to manipulate object motions and have achieved remarkable synthesis results. However, 2D control signals are inherently limited in expressing the 3D nature of object motions. To overcome this problem, we introduce 3DTrajMaster, a robust controller that regulates multi-entity dynamics in 3D space, given user-desired 6DoF pose (location and rotation) sequences of entities. At the core of our approach is a plug-and-play 3D-motion grounded object injector that fuses multiple input entities with their respective 3D trajectories through a gated self-attention mechanism. In addition, we exploit an injector architecture to preserve the video diffusion prior, which is crucial for generalization ability. To mitigate video quality degradation, we introduce a domain adaptor during training and employ an annealed sampling strategy during inference. To address the lack of suitable training data, we construct a 360-Motion Dataset, which first correlates collected 3D human and animal assets with GPT-generated trajectory and then captures their motion with 12 evenly-surround cameras on diverse 3D UE platforms. Extensive experiments show that 3DTrajMaster sets a new state-of-the-art in both accuracy and generalization for controlling multi-entity 3D motions. Project page: <a class="link-external link-http" href="http://fuxiao0719.github.io/projects/3dtrajmaster" rel="external noopener nofollow">this http URL</a>

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to precisely control the three - dimensional motion of multiple objects in video generation. Existing controllable video generation methods mainly rely on two - dimensional control signals to manipulate object motion and have achieved remarkable synthesis effects. However, two - dimensional control signals have inherent limitations in expressing the three - dimensional essence of objects and cannot fully describe the motion characteristics in three - dimensional space, such as rotation and translation. Therefore, the paper proposes a new method - 3DTrajMaster, which aims to regulate the dynamics of multiple entities in three - dimensional space by given the user - expected six - degree - of - freedom pose (position and rotation) sequences of entities. Specifically, the core of 3DTrajMaster is a pluggable 3D motion basic object injector, which fuses multiple input entities and their respective 3D trajectories through the gated self - attention mechanism. In addition, in order to maintain the video diffusion prior, this method also utilizes the injector architecture. To solve the problem of video quality degradation, a domain adapter is introduced during the training process, and an annealing sampling strategy is adopted during inference. To address the problem of lack of suitable training data, the researchers constructed a 360° motion dataset, which for the first time correlates the collected 3D human and animal assets with the trajectories generated by GPT, and then uses 12 evenly distributed cameras to capture their motions on different 3D UE platforms. Through extensive experiments, 3DTrajMaster has reached a new state - of - the - art level in both the accuracy and generalization ability of controlling the 3D motion of multiple entities.

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships

Sketch Based Multi-source 3D Animation Transfer

Motion Prompting: Controlling Video Generation with Motion Trajectories

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Trajectory Attention for Fine-grained Video Motion Control

InTraGen: Trajectory-controlled Video Generation for Object Interactions

AMG: Avatar Motion Guided Video Generation

COMD: Training-free Video Motion Transfer with Camera-Object Motion Disentanglement

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

Trajevae: Controllable Human Motion Generation from Trajectories

DragAnything: Motion Control for Anything Using Entity Representation

Automatic Camera Trajectory Control with Enhanced Immersion for Virtual Cinematography

Grasping Diverse Objects with Simulated Humanoids

TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses

ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Modeling Trajectories for 3D Motion Analysis

Action2video: Generating Videos of Human 3D Actions

Multi-Frame Content Integration with a Spatio-Temporal Attention Mechanism for Person Video Motion Transfer

Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models

Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories