Talking Face Video Generation with Editable Expression

Luchuan Song,Bin Liu,Nenghai Yu
DOI: https://doi.org/10.1007/978-3-030-87361-5_61
2021-01-01
Abstract:In rencent years, the convolutional neural network have been proved to be a great success in generating talking face. Existing methods have combined a single face image with speech to generate talking face video. The challenge with these methods is that only the lips change in the video, lacking other facial expressions such as blinking and eyebrow movements. In order to solve this problem, this paper propose a embedding system to tackle the task of talking face video generation by using a still image of a person and an audio clip containing speech. We can modify some of the natural expressions through high-level structure, i.e., the facial landmarks. Compared with the direct audio-to-image method, our approach avoids spurious correlations between audio-visual signals that were unrelated to the speech content. In addition, to generate the face of the network, a face sequence generation method based on single sample learning is designed.
What problem does this paper attempt to address?