TalkingFlow: Talking Facial Landmark Generation with Multi-Scale Normalizing Flow Network

Sen Liang,Zhize Zhou,Rong Li,Juyong Zhang,Hujun Bao
DOI: https://doi.org/10.1109/icassp43922.2022.9747663
2022-01-01
Abstract:Deterministic models dominate the field of talking facial land-mark generation by directly mapping speech signals to a certain lip-sync facial landmark sequence, which often suffer from regression to the mean face. In contrast, probability generative models are more beneficial to handle complex data space and generate diverse samples. In this work, we pro-pose a flow-based probabilistic network named TalkingFlow to generate natural talking facial landmark with head movements from speech data. It is implemented by a weighted multi-scale architecture to improve model representation capability and a conditional Temporal Convolutional Network module to fuse speech data. Extensive experiments results show that it can effectively generate diverse and natural facial landmark from speech data. All code will be made publicly available online.
What problem does this paper attempt to address?