Abstract:Emotion is considered to be a core element in performances. In computer animation, both body motions and facial expressions are two popular mediums for a character to express the emotion. However, there has been limited research in studying how to effectively synthesize these two types of character movements using different levels of emotion strength with intuitive control, which is difficult to be modeled effectively. In this work, we explore a common model that can be used to represent the emotion for the applications of body motions and facial expressions synthesis. Unlike previous work that encode emotions into discrete motion style descriptors, we propose a continuous control indicator called emotion strength by controlling which a data‐driven approach is presented to synthesize motions with fine control over emotions. Rather than interpolating motion features to synthesize new motion as in existing work, our method explicitly learns a model mapping low‐level motion features to the emotion strength. Because the motion synthesis model is learned in the training stage, the computation time required for synthesizing motions at run time is very low. We further demonstrate the generality of our proposed framework by editing 2D face images using relative emotion strength. As a result, our method can be applied to interactive applications such as computer games, image editing tools, and virtual reality applications, as well as offline applications such as animation and movie production. Unlike previous work that encode emotions into discrete motion style descriptors, we propose a continuous control indicator called emotion strength by controlling which a data‐driven approach is presented to synthesize motions and edit images with fine control over emotions in this research. Our method explicitly learns a model mapping low‐level features to the emotion strength. We further demonstrate the generality of our proposed framework by editing 2D face images and 3D skeletal motion using relative emotion strength.

A Continuous Emotional Editing Model for Talking Head Videos Based on Decoupling Texture and Geometry

Continuously Controllable Facial Expression Editing in Talking Face Videos

Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model

EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion

LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space

Context-Aware Talking-Head Video Editing

Task-agnostic Temporally Consistent Facial Video Editing

EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion

High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

Replacement of Facial Parts in Images.

Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

Facial expression editing in video using a temporally-smooth factorization

Emotionally Controllable Talking Face Generation from an Arbitrary Emotional Portrait

EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

Audio-Driven Emotional Video Portraits

A generic framework for editing and synthesizing multimodal data with relative emotion strength

Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN

Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation

Facial Expression Editing with Continuous Emotion Labels