Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

Jingyun Xue,Hongfa Wang,Qi Tian,Yue Ma,Andong Wang,Zhiyuan Zhao,Shaobo Min,Wenzhe Zhao,Kaihao Zhang,Heung-Yeung Shum,Wei Liu,Mengyang Liu,Wenhan Luo
2024-06-13
Abstract:Pose-controllable character video generation is in high demand with extensive applications for fields such as automatic advertising and content creation on social media platforms. While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple character animation and body occlusion. Additionally, current methods request large-scale high-quality videos with stable backgrounds and temporal consistency as training datasets, otherwise, their performance will greatly deteriorate. These two issues hinder the practical utilization of character image animation tools. In this paper, we propose a practical and robust framework Follow-Your-Pose v2, which can be trained on noisy open-sourced videos readily available on the internet. Multi-condition guiders are designed to address the challenges of background stability, body occlusion in multi-character generation, and consistency of character appearance. Moreover, to fill the gap of fair evaluation of multi-character pose animation, we propose a new benchmark comprising approximately 4,000 frames. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods by a margin of over 35% across 2 datasets and on 7 metrics. Meanwhile, qualitative assessments reveal a significant improvement in the quality of generated video, particularly in scenarios involving complex backgrounds and body occlusion of multi-character, suggesting the superiority of our approach.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the incoherent animation problem of existing character image animation methods when dealing with complex scenes, especially in the case of multi - character animation and body occlusion. In addition, current methods require large - scale, high - quality video datasets, which require background stability and temporal consistency; otherwise, the performance will decline significantly. These problems limit the practical application of character image animation tools. To this end, the paper proposes a practical and powerful framework, Follow - Your - Pose v2. This framework can be trained using noisy open - source videos that are easily available on the Internet, and a multi - condition guide is designed to solve the problems of background stability, body occlusion in multi - character generation, and character appearance consistency. In addition, to fill the gap in fair evaluation of multi - character pose animation, the paper also proposes a new benchmark test, which contains approximately 4,000 frames of images. A large number of experiments show that this method outperforms the existing state - of - the - art methods by more than 35% on two datasets and seven metrics. In particular, in scenes involving complex backgrounds and multi - character body occlusions, the quality of the generated videos is significantly improved.