Abstract:Real-world talking faces often accompany with natural head movement. However, most existing talking face video generation methods only consider facial animation with fixed head pose. In this paper, we address this problem by proposing a deep neural network model that takes an audio signal A of a source person and a very short video V of a target person as input, and outputs a synthesized high-quality talking face video with personalized head pose (making use of the visual information in V), expression and lip synchronization (by considering both A and V). The most challenging issue in our work is that natural poses often cause in-plane and out-of-plane head rotations, which makes synthesized talking face video far from realistic. To address this challenge, we reconstruct 3D face animation and re-render it into synthesized frames. To fine tune these frames into realistic ones with smooth background transition, we propose a novel memory-augmented GAN module. By first training a general mapping based on a publicly available dataset and fine-tuning the mapping using the input short video of target person, we develop an effective strategy that only requires a small number of frames (about 300 frames) to learn personalized talking behavior including head pose. Extensive experiments and two user studies show that our method can generate high-quality (i.e., personalized head movements, expressions and good lip synchronization) talking face videos, which are naturally looking with more distinguishing head movement effects than the state-of-the-art methods.

Multi-View Face Image Synthesis Using Factorization Model

A synthesis method for personalized 3D face reconstruction

Facial Expression Synthesis and Recognition Using a Kernel-Based Factorization Model

Kernel-Based Multifactor Analysis for Image Synthesis and Recognition

Frontal Face Synthesizing According to Multiple Non-Frontal Inputs and Its Application in Face Recognition.

Facial expressive image analysis by using nonlinear factorization model.

Human Multi-View Synthesis from a Single-View Model:Transferred Body and Face Representations

IMAGE-BASED MULTI-VIEW 3D FACE GENERATION

Multi-view Face Synthesis Using Minimum Bending Deformation

Face Pose Estimate and Multi-pose Synthesize by 2D Morphable Model

Multiple Representations-Based Face Sketch–Photo Synthesis

Automatic frontal view face image synthesis

Face Pose Estimation and Synthesis by 2D Morphable Model

Synthesis of Face Image with Pose Variations

Synthesizing For Face Recognition

Multimodal Face Synthesis From Visual Attributes

Cafca: High-quality Novel View Synthesis of Expressive Faces from Casual Few-shot Captures

Face Sketch-Photo Synthesis under Multi-dictionary Sparse Representation Framework

Audio-driven Talking Face Video Generation with Natural Head Pose

Image-based facial sketch-to-photo synthesis via online coupled dictionary learning

Frontal Face Synthesis Based on Multiple Pose-Variant Images for Face Recognition.