MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

Shuyuan Tu,Qi Dai,Zihao Zhang,Sicheng Xie,Zhi-Qi Cheng,Chong Luo,Xintong Han,Zuxuan Wu,Yu-Gang Jiang

2024-05-31

Abstract:Despite impressive advancements in diffusion-based video editing models in altering video attributes, there has been limited exploration into modifying motion information while preserving the original protagonist's appearance and background. In this paper, we propose MotionFollower, a lightweight score-guided diffusion model for video motion editing. To introduce conditional controls to the denoising process, MotionFollower leverages two of our proposed lightweight signal controllers, one for poses and the other for appearances, both of which consist of convolution blocks without involving heavy attention calculations. Further, we design a score guidance principle based on a two-branch architecture, including the reconstruction and editing branches, which significantly enhance the modeling capability of texture details and complicated backgrounds. Concretely, we enforce several consistency regularizers and losses during the score estimation. The resulting gradients thus inject appropriate guidance to the intermediate latents, forcing the model to preserve the original background details and protagonists' appearances without interfering with the motion modification. Experiments demonstrate the competitive motion editing ability of MotionFollower qualitatively and quantitatively. Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory while delivering superior motion editing performance and exclusively supporting large camera movements and actions.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the issue of modifying motion information in video editing while keeping the appearance of the main character and the background unchanged. Specifically, existing diffusion models have made significant progress in changing video attributes (such as style transfer and manipulation of the background and main character's appearance), but they fall short when it comes to modifying motion information. To this end, the authors propose a lightweight score-guided diffusion model called MotionFollower. The main contributions of the paper include: 1. Proposing two lightweight signal controllers (a pose controller and a reference controller) to replace the original complex network structure, thereby reducing computational costs. 2. Introducing a new score-guided principle that allows the model to retain the semantic details of the source video during inference, such as the background and camera movements. 3. Compared to the state-of-the-art motion editing model MotionEditor, MotionFollower achieves approximately 80% reduction in GPU memory usage and performs excellently when processing videos with extensive camera movements and complex backgrounds. Through these improvements, MotionFollower not only enhances the quality of video motion editing but also significantly reduces the demand for computational resources.

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

MotionEditor: Editing Video Motion via Content-Aware Diffusion

Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing

MotionCrafter: One-Shot Motion Customization of Diffusion Models

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

COMD: Training-free Video Motion Transfer with Camera-Object Motion Disentanglement

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

ReVideo: Remake a Video with Motion and Content Control

MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models

Dreamix: Video Diffusion Models are General Video Editors

DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing

Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

High-Fidelity Diffusion Editor for Zero-Shot Text-Guided Video Editing

DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

StableVideo: Text-driven Consistency-aware Diffusion Video Editing