Abstract:Production-level workflows for producing convincing 3D dynamic human faces have long relied on an assortment of labor-intensive tools for geometry and texture generation, motion capture and rigging, and expression synthesis. Recent neural approaches automate individual components but the corresponding latent representations cannot provide artists with explicit controls as in conventional tools. In this paper, we present a new learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets. For data collection, we construct a hybrid multiview-photometric capture stage, coupling with ultra-fast video cameras to obtain raw 3D facial assets. We then set out to model the facial expression, geometry and physically-based textures using separate VAEs where we impose a global MLP based expression mapping across the latent spaces of respective networks, to preserve characteristics across respective attributes. We also model the delta information as wrinkle maps for the physically-based textures, achieving high-quality 4K dynamic textures. We demonstrate our approach in high-fidelity performer-specific facial capture and cross-identity facial motion retargeting. In addition, our multi-VAE-based neural asset, along with the fast adaptation schemes, can also be deployed to handle in-the-wild videos. Besides, we motivate the utility of our explicit facial disentangling strategy by providing various promising physically-based editing results with high realism. Comprehensive experiments show that our technique provides higher accuracy and visual fidelity than previous video-driven facial reconstruction and animation methods.

APB2FACE: Audio-Guided Face Reenactment with Auxiliary Pose and Blink Signals.

APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment

Real-Time Audio-Guided Multi-Face Reenactment

FaceSwapNet: Landmark Guided Many-to-Many Face Reenactment

FReeNet: Multi-Identity Face Reenactment

Realistic Face Reenactment Via Self-Supervised Disentangling of Identity and Pose

Audio-driven Talking Face Video Generation with Natural Head Pose

Parametric Implicit Face Representation for Audio-Driven Facial Reenactment

Neural Voice Puppetry: Audio-driven Facial Reenactment

CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding

EnNeRFACE: Improving the Generalization of Face Reenactment with Adaptive Ensemble Neural Radiance Fields.

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Video-Driven Neural Physically-Based Facial Asset for Production

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose

Speech driven photo realistic facial animation based on an articulatory DBN model and AAM features

RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

One-shot many-to-many facial reenactment using Bi-Layer Graph Convolutional Networks

MA-NeRF: Motion-Assisted Neural Radiance Fields for Face Synthesis from Sparse Images

AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction

Speech Driven MPEG-4 Based Face Animation via Neural Network

NOFA: NeRF-based One-shot Facial Avatar Reconstruction