Hierarchical Feature Warping and Blending for Talking Head Animation

Jiale Zhang,Chengxin Liu,Ke Xian,Zhiguo Cao
DOI: https://doi.org/10.1109/tcsvt.2024.3375330
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Talking head animation transforms a source anime image to a target pose, where the transformation includes the change of facial expression and head movement. In contrast to existing approaches that operate on the low-resolution image (256 × 256), we study this task at a higher resolution, e.g., 512 × 512. High-resolution talking head animation, however, raises two major challenges: i) how to achieve smooth global transformation while maintaining rich details of anime characters under large-displacement pose variations; ii) how to address the shortage of data, because no related dataset is publicly available. In this paper, we present a Hierarchical Feature Warping and Blending (HFWB) model, which tackles talking head animation hierarchically. Specifically, we use low-level features to control global transformation and high-level features to determine the details of anime characters, under the guidance of feature flow fields. These features are then blended by selective fusion units, outputting transformed anime images. In addition, we construct an anime pose dataset–AniTalk-2K, aiming to alleviate the shortage of data. It contains around 2000 anime characters with thousands of different face/head poses at a resolution of 512 × 512. Extensive experiments on AniTalk-2K demonstrate the superiority of our approach in generating high-quality anime talking heads over state-of-the-art methods.
engineering, electrical & electronic
What problem does this paper attempt to address?