Abstract:Audio-driven emotional 3D face animation aims to generate emotionally expressive talking heads with synchronized lip movements. However, previous research has often overlooked the influence of diverse emotions on facial expressions or proved unsuitable for driving MetaHuman models. In response to this deficiency, we introduce EmoFace, a novel audio-driven methodology for creating facial animations with vivid emotional dynamics. Our approach can generate facial expressions with multiple emotions, and has the ability to generate random yet natural blinks and eye movements, while maintaining accurate lip synchronization. We propose independent speech encoders and emotion encoders to learn the relationship between audio, emotion and corresponding facial controller rigs, and finally map into the sequence of controller values. Additionally, we introduce two post-processing techniques dedicated to enhancing the authenticity of the animation, particularly in blinks and eye movements. Furthermore, recognizing the scarcity of emotional audio-visual data suitable for MetaHuman model manipulation, we contribute an emotional audio-visual dataset and derive control parameters for each frames. Our proposed methodology can be applied in producing dialogues animations of non-playable characters (NPCs) in video games, and driving avatars in virtual reality environments. Our further quantitative and qualitative experiments, as well as an user study comparing with existing researches show that our approach demonstrates superior results in driving 3D facial models. The code and sample data are available at <a class="link-external link-https" href="https://github.com/SJTU-Lucy/EmoFace" rel="external noopener nofollow">this https URL</a>.

EAT-Face: Emotion-Controllable Audio-Driven Talking Face Generation Via Diffusion Model

Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model

Emotionally Controllable Talking Face Generation from an Arbitrary Emotional Portrait

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model

Audio-driven Talking Face Video Generation with Natural Head Pose

Talking Face Generation With Audio-Deduced Emotional Landmarks

Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion

EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation

Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation

EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion

EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning

EmoFace: Audio-driven Emotional 3D Face Animation

Controllable Talking Face Generation by Implicit Facial Keypoints Editing

Continuously Controllable Facial Expression Editing in Talking Face Videos

EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation

Audio-Driven Emotional Video Portraits

Talking Faces: Audio-to-Video Face Generation

3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head