Abstract:We present a new system that applies an example-based learning method to learn facial motion patterns from a video sequence of individual facial behavior such as lip motion and facial expressions, and using that to create vivid three-dimensional (3-D) face animation according to the definition of MPEG-4 face animation parameters. The system consists of three key modules, face tracking, pattern learning, and face animation. In face tracking, to reduce the complexity of the tracking process, a novel coarse-to-fine strategy combined with a Kalman filter is proposed for localizing key facial landmarks in each image of the video. The landmarks' sequence is normalized into a visual feature matrix and then fed to the next step of process. In pattern learning, in the pretraining stage, the parameters of the camera that took the video are requested with the training video data so the system can estimate the basic mapping from a normalized two-dimensional (2-D) visual feature matrix to the representation in 3-D MPEG-4 face animation parameter space, in assistance with the computer vision method. In the practice stage, considering that in most cases camera parameters are not provided with video data, the system uses machine learning technology to complement the incomplete 3-D information for the mapping that information is needed in face orientation presentation. The example-based learning in this system integrates several methods including clustering, HMM, and ANN to make a better conversion from a 2-D to 3-D model and better estimation of incomplete 3-D information for good mapping; this will be used to drive face animation thereafter. In face animation, the system can synthesize face animation following any type of face motion in video. Experiments show that our system produces more vivid face motion animation, compared to other early systems.

Speech Driven MPEG-4 Based Face Animation via Neural Network

APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment

APB2FACE: Audio-Guided Face Reenactment with Auxiliary Pose and Blink Signals.

Real-Time Audio-Guided Multi-Face Reenactment

Mining Audio/Visual Database For Speech Driven Face Animation

Data Mining and Speech Driven Face Animation

Learning and Synthesizing Mpeg-4 Compatible 3-D Face Animation from Video Sequence

Audio-driven Talking Face Video Generation with Natural Head Pose

Real-time Speech-Driven Animation of Expressive Talking Faces.

Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks

Face Animation Based on Large Audiovisual Database

Speech Driven Facial Animation Using Chinese Mandarin Pronunciation Rules

Low-Rank Active Learning for Generating Speech-Drive Human Face Animation

Expressive Face Animation Synthesis Based on Dynamic Mapping Method

Video-audio Driven Real-Time Facial Animation.

Real-time speech-driven lip synchronization

KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

MPEG-4 Based Facial Expression Image Morphing

SATFace: Subject Agnostic Talking Face Generation with Natural Head Movement

Real-time Synthesis of Chinese Visual Speech and Facial Expressions Using MPEG-4 FAP Features in a Three-Dimensional Avatar

Speech-driven Facial Animation with Spectral Gathering and Temporal Attention.