Research on face video generation algorithm based on speech content

Chengqin Wu,Yulin Sun,Liangyu Chen,Qunqin Pan,Chao Zhang
DOI: https://doi.org/10.1109/ICICML60161.2023.10424912
2023-11-03
Abstract:Aiming at the problem that the audio and lip motion are not synchronized and the face appears white light in the speech-driven face speech video, the audio-visual synchronization module and the highlight noise reduction module are proposed on the basis of the Wav2Lip network. Firstly, the sound visual alignment module is added to make the model learn the context information of the speech signal in the forward and backward directions, so as to improve the accuracy of lip synchronization. Secondly, in the image encoder, the highlight noise reduction module (HDM) is introduced to solve the problem of noise and overexposure in the highlight area of the generated speech face video. The experimental results on the LRS2 dataset show that the confidence score (LSE-C), the deviation index (LSE-D) and the image structure similarity (SSIM) of the audio-visual quality alignment network (AVQA) in this paper reach 8.565 and 4.605 and 0.872 respectively.
Computer Science
What problem does this paper attempt to address?