Abstract:Face presentation attacks (FPA), also known as face spoofing, have brought increasing concerns to the public through various malicious applications, such as financial fraud and privacy leakage. Therefore, safeguarding face recognition systems against FPA is of utmost importance. Although existing learning-based face anti-spoofing (FAS) models can achieve outstanding detection performance, they lack generalization capability and suffer significant performance drops in unforeseen environments. Many methodologies seek to use auxiliary modality data (e.g., depth and infrared maps) during the presentation attack detection (PAD) to address this limitation. However, these methods can be limited since (1) they require specific sensors such as depth and infrared cameras for data capture, which are rarely available on commodity mobile devices, and (2) they cannot work properly in practical scenarios when either modality is missing or of poor quality. In this paper, we devise an accurate and robust MultiModal Mobile Face Anti-Spoofing system named M3FAS to overcome the issues above. The primary innovation of this work lies in the following aspects: (1) To achieve robust PAD, our system combines visual and auditory modalities using three commonly available sensors: camera, speaker, and microphone; (2) We design a novel two-branch neural network with three hierarchical feature aggregation modules to perform cross-modal feature fusion; (3). We propose a multi-head training strategy, allowing the model to output predictions from the vision, acoustic, and fusion heads, resulting in a more flexible PAD. Extensive experiments have demonstrated the accuracy, robustness, and flexibility of M3FAS under various challenging experimental settings. The source code and dataset are available at: <a class="link-external link-https" href="https://github.com/ChenqiKONG/M3FAS/" rel="external noopener nofollow">this https URL</a>

Multi-modal Face Anti-spoofing Using Channel Cross Fusion Network and Global Depth-Wise Convolution.

Multi-modal Face Anti-spoofing Using Multi-fusion Network and Global Depth-wise Convolution

Dual-Cross Central Difference Network for Face Anti-Spoofing.

A Cascade Face Spoofing Detector Based on Face Anti-Spoofing R-CNN and Improved Retinex LBP

Multi-Modal Face Anti-Spoofing Based on Central Difference Networks

Selective Domain-Invariant Feature Alignment Network for Face Anti-Spoofing.

Self-Attention and MLP Auxiliary Convolution for Face Anti-Spoofing

Face Anti-Spoofing with Human Material Perception

Static and Dynamic Fusion for Multi-modal Cross-ethnicity Face Anti-spoofing

Searching Central Difference Convolutional Networks for Face Anti-Spoofing

Face anti-spoofing with cross-stage relation enhancement and spoof material perception

Face Anti-Spoofing by Fusing High and Low Frequency Features for Advanced Generalization Capability

Face Liveness Detection by rPPG Features and Contextual Patch-Based CNN

Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing

Two-stream Convolutional Networks for Multi-frame Face Anti-spoofing

Face anti-spoofing using patch and depth-based CNNs

PipeNet: Selective Modal Pipeline of Fusion Network for Multi-Modal Face Anti-Spoofing

Reinforcing Face Anti-Spoofing with Multi-Scale Modality

Deep Learning for Face Anti-Spoofing: A Survey

M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System