Abstract:The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video, image-to-video generation, video editing, and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate substantial computation resources due to the large amount of parameters in video generation models. Moreover, existing methods pre-define camera motion types during training, which limits their flexibility in camera control, preventing the realization of some specific camera controls, such as various camera movements in films. Therefore, to reduce training costs and achieve flexible camera control, we propose COMD, a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos and transfers the extracted camera motions to new videos. We first propose a one-shot camera motion disentanglement method to extract camera motion from a single source video, which separates the moving objects from the background and estimates the camera motion in the moving objects region based on the motion in the background by solving a Poisson equation. Furthermore, we propose a few-shot camera motion disentanglement method to extract the common camera motion from multiple videos with similar camera motions, which employs a window-based clustering technique to extract the common features in temporal attention maps of multiple videos. Finally, we propose a motion combination method to combine different types of camera motions together, enabling our model a more controllable and flexible camera control. Extensive experiments demonstrate that our training-free approach can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks, achieving flexible and diverse camera motion control. Project Page: https://sjtuplayer.github.io/projects/MotionMaster.

AnaMoDiff: 2D Analogical Motion Diffusion via Disentangled Denoising

MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis

Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models

RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation

COMD: Training-free Video Motion Transfer with Camera-Object Motion Disentanglement

Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models

Human Motion Diffusion as a Generative Prior

Disentangled Motion Modeling for Video Frame Interpolation

Video Motion Transfer with Diffusion Transformers

Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion Probabilistic Models

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training

PhysDiff: Physics-Guided Human Motion Diffusion Model

MoVideo: Motion-Aware Video Generation with Diffusion Models

AMD:Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion

R2-Diff: Denoising by diffusion as a refinement of retrieved motion for image-based motion prediction

AAMDM: Accelerated Auto-regressive Motion Diffusion Model