MotionMaster: Training-free Camera Motion Transfer For Video Generation

Teng Hu,Jiangning Zhang,Ran Yi,Yating Wang,Hongrui Huang,Jieyu Weng,Yabiao Wang,Lizhuang Ma

2024-05-01

Abstract:The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video generation and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate substantial computation resources due to the large amount of parameters in video generation models. Moreover, existing methods pre-define camera motion types during training, which limits their flexibility in camera control. Therefore, to reduce training costs and achieve flexible camera control, we propose COMD, a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos and transfers the extracted camera motions to new videos. We first propose a one-shot camera motion disentanglement method to extract camera motion from a single source video, which separates the moving objects from the background and estimates the camera motion in the moving objects region based on the motion in the background by solving a Poisson equation. Furthermore, we propose a few-shot camera motion disentanglement method to extract the common camera motion from multiple videos with similar camera motions, which employs a window-based clustering technique to extract the common features in temporal attention maps of multiple videos. Finally, we propose a motion combination method to combine different types of camera motions together, enabling our model a more controllable and flexible camera control. Extensive experiments demonstrate that our training-free approach can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks, achieving flexible and diverse camera motion control.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper addresses the problem of lacking flexible and untrained camera motion control in video generation. Existing methods rely on training camera modules, which consume a large amount of resources and can only predefine simple camera motions. To solve this problem, the paper proposes MotionMaster, an untrained video motion transfer model that decomposes the camera motion and object motion in the source video and transfers the extracted camera motion to a new video. This enables the control of complex camera motions, reduces training costs, and provides more flexible and diverse camera motion control.

MotionMaster: Training-free Camera Motion Transfer For Video Generation

COMD: Training-free Video Motion Transfer with Camera-Object Motion Disentanglement

Training-free Camera Control for Video Generation

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

Video Diffusion Models are Training-free Motion Interpreter and Controller

MVOC: a training-free multiple video object composition method with diffusion models

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

Boosting Camera Motion Control for Video Diffusion Transformers

MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance

MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models

Cinematographic Camera Diffusion Model

Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing

CPA: Camera-pose-awareness Diffusion Transformer for Video Generation

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling