Abstract:Face presentation attacks (FPA), also known as face spoofing, have brought increasing concerns to the public through various malicious applications, such as financial fraud and privacy leakage. Therefore, safeguarding face recognition systems against FPA is of utmost importance. Although existing learning-based face anti-spoofing (FAS) models can achieve outstanding detection performance, they lack generalization capability and suffer significant performance drops in unforeseen environments. Many methodologies seek to use auxiliary modality data (e.g., depth and infrared maps) during the presentation attack detection (PAD) to address this limitation. However, these methods can be limited since (1) they require specific sensors such as depth and infrared cameras for data capture, which are rarely available on commodity mobile devices, and (2) they cannot work properly in practical scenarios when either modality is missing or of poor quality. In this paper, we devise an accurate and robust MultiModal Mobile Face Anti-Spoofing system named M3FAS to overcome the issues above. The primary innovation of this work lies in the following aspects: (1) To achieve robust PAD, our system combines visual and auditory modalities using three commonly available sensors: camera, speaker, and microphone; (2) We design a novel two-branch neural network with three hierarchical feature aggregation modules to perform cross-modal feature fusion; (3). We propose a multi-head training strategy, allowing the model to output predictions from the vision, acoustic, and fusion heads, resulting in a more flexible PAD. Extensive experiments have demonstrated the accuracy, robustness, and flexibility of M3FAS under various challenging experimental settings. The source code and dataset are available at: <a class="link-external link-https" href="https://github.com/ChenqiKONG/M3FAS/" rel="external noopener nofollow">this https URL</a>

Multi-modal Face Anti-spoofing Using Multi-fusion Network and Global Depth-wise Convolution

Multi-modal Face Anti-spoofing Using Channel Cross Fusion Network and Global Depth-Wise Convolution.

A Cascade Face Spoofing Detector Based on Face Anti-Spoofing R-CNN and Improved Retinex LBP

Face Anti-Spoofing with Human Material Perception

Self-Attention and MLP Auxiliary Convolution for Face Anti-Spoofing

Dual-Cross Central Difference Network for Face Anti-Spoofing.

Multi-Modal Face Anti-Spoofing Based on Central Difference Networks

Reinforcing Face Anti-Spoofing with Multi-Scale Modality

Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing

Face anti-spoofing with cross-stage relation enhancement and spoof material perception

M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System

Deep Learning for Face Anti-Spoofing: A Survey

Static and Dynamic Fusion for Multi-modal Cross-ethnicity Face Anti-spoofing

Searching Central Difference Convolutional Networks for Face Anti-Spoofing

Advanced Face Anti-Spoofing with Depth Segmentation.

CASIA-SURF CeFA: A Benchmark for Multi-modal Cross-ethnicity Face Anti-spoofing

Face Anti-Spoofing by Fusing High and Low Frequency Features for Advanced Generalization Capability

Face Liveness Detection by rPPG Features and Contextual Patch-Based CNN

PipeNet: Selective Modal Pipeline of Fusion Network for Multi-Modal Face Anti-Spoofing

Deep Spatial Gradient and Temporal Depth Learning for Face Anti-Spoofing