Audio-Driven 3D Facial Animation from In-the-Wild Videos

Liying Lu,Tianke Zhang,Yunfei Liu,Xuangeng Chu,Yu Li

2023-06-20

Abstract:Given an arbitrary audio clip, audio-driven 3D facial animation aims to generate lifelike lip motions and facial expressions for a 3D head. Existing methods typically rely on training their models using limited public 3D datasets that contain a restricted number of audio-3D scan pairs. Consequently, their generalization capability remains limited. In this paper, we propose a novel method that leverages in-the-wild 2D talking-head videos to train our 3D facial animation model. The abundance of easily accessible 2D talking-head videos equips our model with a robust generalization capability. By combining these videos with existing 3D face reconstruction methods, our model excels in generating consistent and high-fidelity lip synchronization. Additionally, our model proficiently captures the speaking styles of different individuals, allowing it to generate 3D talking-heads with distinct personal styles. Extensive qualitative and quantitative experimental results demonstrate the superiority of our method.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the problem of audio-driven 3D facial animation generation, with specific objectives including: 1. **Improving the model's generalization ability**: Existing methods typically rely on limited publicly available 3D datasets for training, which contain a small number of audio-3D scan pairs, resulting in limited generalization ability of the models. This paper proposes a method to train 3D facial animation models using a large amount of easily accessible in-the-wild 2D talking head videos to enhance the model's generalization ability. 2. **Generating high-fidelity lip-sync effects**: By combining existing 3D face reconstruction methods, the model can produce consistent and high-precision lip-sync effects across different individuals. 3. **Capturing different speaking styles**: The model can capture the speaking styles of different individuals and generate 3D talking head videos with specific identity characteristics. 4. **Emotion control functionality**: The model also has emotion control capabilities, allowing it to generate corresponding expressions based on specified emotional styles (such as anger, happiness, or sadness). In summary, the main contribution of the paper is the proposal of a new method that utilizes in-the-wild 2D video data to train 3D facial animation models, thereby addressing the data scarcity issue in existing methods and significantly improving the model's performance in generating realistic and accurately lip-synced 3D facial animations.

Audio-Driven 3D Facial Animation from In-the-Wild Videos

Audio-driven Talking Face Video Generation with Natural Head Pose

Meta Talk: Learning To Data-Efficiently Generate Audio-Driven Lip-Synchronized Talking Face With High Definition

VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior

Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape

Learning Audio-Driven Viseme Dynamics for 3D Face Animation

Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts

Audio-driven talking face generation with diverse yet realistic facial animations

KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

Video-audio Driven Real-Time Facial Animation.

Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert

Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation

DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation

3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose

Learn2Talk: 3D Talking Face Learns from 2D Talking Face

RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

FaceFormer: Speech-Driven 3D Facial Animation with Transformers

MakeItTalk: Speaker-Aware Talking-Head Animation

Transferring of Speech Movements from Video to 3D Face Space

High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model