Speech-driven Facial Animation with Spectral Gathering and Temporal Attention.
Chai Yujin,Weng Yanlin,Wang Lvdi,Zhou Kun
DOI: https://doi.org/10.1007/s11704-020-0133-7
IF: 2.6688
2021-01-01
Frontiers of Computer Science
Abstract:In this paper, we present an efficient algorithm that generates lip-synchronized facial animation from a given vocal audio clip. By combining spectral-dimensional bidirectional long short-term memory and temporal attention mechanism, we design a light-weight speech encoder that learns useful and robust vocal features from the input audio without resorting to pre-trained speech recognition modules or large training data. To learn subject-independent facial motion, we use deformation gradients as the internal representation, which allows nuanced local motions to be better synthesized than using vertex offsets. Compared with state-of-the-art automatic-speech-recognition-based methods, our model is much smaller but achieves similar robustness and quality most of the time, and noticeably better results in certain challenging cases.
What problem does this paper attempt to address?