Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

Jingyun Xue,Hongfa Wang,Qi Tian,Yue Ma,Andong Wang,Zhiyuan Zhao,Shaobo Min,Wenzhe Zhao,Kaihao Zhang,Heung-Yeung Shum,Wei Liu,Mengyang Liu,Wenhan Luo

2024-06-13

Abstract:Pose-controllable character video generation is in high demand with extensive applications for fields such as automatic advertising and content creation on social media platforms. While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple character animation and body occlusion. Additionally, current methods request large-scale high-quality videos with stable backgrounds and temporal consistency as training datasets, otherwise, their performance will greatly deteriorate. These two issues hinder the practical utilization of character image animation tools. In this paper, we propose a practical and robust framework Follow-Your-Pose v2, which can be trained on noisy open-sourced videos readily available on the internet. Multi-condition guiders are designed to address the challenges of background stability, body occlusion in multi-character generation, and consistency of character appearance. Moreover, to fill the gap of fair evaluation of multi-character pose animation, we propose a new benchmark comprising approximately 4,000 frames. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods by a margin of over 35% across 2 datasets and on 7 metrics. Meanwhile, qualitative assessments reveal a significant improvement in the quality of generated video, particularly in scenarios involving complex backgrounds and body occlusion of multi-character, suggesting the superiority of our approach.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the incoherent animation problem of existing character image animation methods when dealing with complex scenes, especially in the case of multi - character animation and body occlusion. In addition, current methods require large - scale, high - quality video datasets, which require background stability and temporal consistency; otherwise, the performance will decline significantly. These problems limit the practical application of character image animation tools. To this end, the paper proposes a practical and powerful framework, Follow - Your - Pose v2. This framework can be trained using noisy open - source videos that are easily available on the Internet, and a multi - condition guide is designed to solve the problems of background stability, body occlusion in multi - character generation, and character appearance consistency. In addition, to fill the gap in fair evaluation of multi - character pose animation, the paper also proposes a new benchmark test, which contains approximately 4,000 frames of images. A large number of experiments show that this method outperforms the existing state - of - the - art methods by more than 35% on two datasets and seven metrics. In particular, in scenes involving complex backgrounds and multi - character body occlusions, the quality of the generated videos is significantly improved.

Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

Zero-shot High-fidelity and Pose-controllable Character Animation

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

Follow-Your-MultiPose: Tuning-Free Multi-Character Text-to-Video Generation via Pose Guidance

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

CoP: Chain-of-Pose for Image Animation in Large Pose Changes

Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation

CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization

Do as I Do: Pose Guided Human Motion Copy

Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model

FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters

Supervised Video-To-Video Synthesis For Single Human Pose Transfer