Synthesizing Talking Faces from Text and Audio: an Autoencoder and Sequence-to-Sequence Convolutional Neural Network

Na Liu,Tao Zhou,Yunfeng Ji,Ziyi Zhao,Lihong Wan
DOI: https://doi.org/10.1016/j.patcog.2020.107231
IF: 8
2020-01-01
Pattern Recognition
Abstract:•An effective landmark localization pipeline based on landmark detection, optical flow estimation, and Kalman filter, is proposed to avoid face shake.•Part-based autoencoder is introduced to learn low-dimensional representation on different face regions.•A sequence-to-sequence convolutional neural network with residual units is proposed to learn the mapping from phoneme to facial codes.•The method is tested two public audio-visual datasets and a new dataset called Chinese CCTV News demonstrate the effectiveness of the proposed method against other state-of-the-art methods.
What problem does this paper attempt to address?