Abstract:Existing diffusion-based methods have achieved impressive results in human motion editing. However, these methods often exhibit significant ghosting and body distortion in unseen in-the-wild cases. In this paper, we introduce Edit-Your-Motion, a video motion editing method that tackles these challenges through one-shot fine-tuning on unseen cases. Specifically, firstly, we utilized DDIM inversion to initialize the noise, preserving the appearance of the source video and designed a lightweight motion attention adapter module to enhance motion fidelity. DDIM inversion aims to obtain the implicit representations by estimating the prediction noise from the source video, which serves as a starting point for the sampling process, ensuring the appearance consistency between the source and edited videos. The Motion Attention Module (MA) enhances the model's motion editing ability by resolving the conflict between the skeleton features and the appearance features. Secondly, to effectively decouple motion and appearance of source video, we design a spatio-temporal two-stage learning strategy (STL). In the first stage, we focus on learning temporal features of human motion and propose recurrent causal attention (RCA) to ensure consistency between video frames. In the second stage, we shift focus on learning the appearance features of the source video. With Edit-Your-Motion, users can edit the motion of humans in the source video, creating more engaging and diverse content. Extensive qualitative and quantitative experiments, along with user preference studies, show that Edit-Your-Motion outperforms other methods.

Dreamix: Video Diffusion Models are General Video Editors

Dreamix: Video Diffusion Models are General Video Editors

Pix2Video: Video Editing using Image Diffusion

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

Diffusion Model-Based Video Editing: A Survey

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models

Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing

Towards motion from video diffusion models

MoVideo: Motion-Aware Video Generation with Diffusion Models

Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices

FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing

Structure and Content-Guided Video Synthesis with Diffusion Models

VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing

DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

EffiVED:Efficient Video Editing via Text-instruction Diffusion Models

Video Diffusion Models