Abstract:Face anti-spoofing (FAS) and face forgery detection play vital roles in securing face biometric systems from presentation attacks (PAs) and vicious digital manipulation (e.g., deepfakes). Despite satisfactory performance upon large-scale data and powerful deep models, recent advances in face spoofing and forgery detection approaches usually focus on 1) unimodal visual appearance or physiological (i.e., remote photoplethysmography (rPPG)) cues; and 2) separated feature representation for FAS or face forgery detection. On one side, unimodal appearance and rPPG features are respectively vulnerable to high-fidelity face 3D mask and video replay attacks, inspiring us to design reliable multi-modal fusion mechanisms for generalized FAS. On the other side, there are rich common features across FAS and face forgery detection tasks (e.g., periodic rPPG rhythms and vanilla appearance for bonafides), providing solid evidence to design a joint FAS and face forgery detection system in a multi-task learning fashion. In this paper, we establish the first joint face spoofing and forgery detection benchmark using both visual appearance and physiological rPPG cues. To enhance the rPPG periodicity discrimination, we design a two-branch physiological network using both facial spatio-temporal rPPG signal map and its continuous wavelet transformed counterpart as inputs. To mitigate the modality bias and improve the fusion efficacy, we conduct a weighted batch and layer normalization for both appearance and rPPG features before multi-modal fusion. We also investigate prevalent deep models, feature fusion strategies and multi-task learning configurations for joint face spoofing and forgery detection. We find that the generalization capacities of both unimodal (appearance or rPPG) and multi-modal (appearance+rPPG) models can be obviously improved via joint training on these two tasks. We hope this new benchmark will facilitate the future research of both FAS and deepfake detection communities. The codes will be released athttps://github.com/ZitongYu/Benchmarking.

Joint Gaze Correction and Face Beautification for Conference Video using Dual Sparsity Prior

Video Conference System for Enhancing Quality of Target Region under Low Bit Rate

Chunk-wise Face Model Based Gaze Correction in Conversational Videos with Single Camera

Eye gaze correction with stereovision for video-teleconferencing

Video-driven state-aware facial animation

Joint Structured Sparsity Regularized Multiview Dimension Reduction for Video-Based Facial Expression Recognition.

Indicating eye contacts in one-to-many video teleconference with one web camera

Towards Ultra-Low-Bitrate Video Conferencing Using Facial Landmarks

Dual In-painting Model for Unsupervised Gaze Correction and Animation in the Wild.

Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation

A Scalable Video Conferencing System Using Cached Facial Expressions.

Face beautification: Beyond makeup transfer

A Data-Driven Approach for Facial Expression Retargeting in Video

Towards Realistic Visual Dubbing with Heterogeneous Sources

G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing

Content-aware Facial Image Compression with Deep Learning Method

Facial Depth Map Enhancement Via Neighbor Embedding.

Facial Expression Recognition Via Weighted Group Sparsity.

Benchmarking Joint Face Spoofing and Forgery Detection With Visual and Physiological Cues

Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Face Denoising and 3D Reconstruction from A Single Depth Image