Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing

Xun Lin,Shuai Wang,Rizhao Cai,Yizhong Liu,Ying Fu,Zitong Yu,Wenzhong Tang,Alex Kot
2024-03-05
Abstract:Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks. With advancements in sensor manufacture and multi-modal learning techniques, many multi-modal FAS approaches have emerged. However, they face challenges in generalizing to unseen attacks and deployment conditions. These challenges arise from (1) modality unreliability, where some modality sensors like depth and infrared undergo significant domain shifts in varying environments, leading to the spread of unreliable information during cross-modal feature fusion, and (2) modality imbalance, where training overly relies on a dominant modality hinders the convergence of others, reducing effectiveness against attack types that are indistinguishable sorely using the dominant modality. To address modality unreliability, we propose the Uncertainty-Guided Cross-Adapter (U-Adapter) to recognize unreliably detected regions within each modality and suppress the impact of unreliable regions on other modalities. For modality imbalance, we propose a Rebalanced Modality Gradient Modulation (ReGrad) strategy to rebalance the convergence speed of all modalities by adaptively adjusting their gradients. Besides, we provide the first large-scale benchmark for evaluating multi-modal FAS performance under domain generalization scenarios. Extensive experiments demonstrate that our method outperforms state-of-the-art methods. Source code and protocols will be released on
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper focuses on the problem of multi-modal face anti-spoofing (FAS), which is an important technique to ensure the security of face recognition systems. Current multi-modal approaches perform poorly in dealing with unseen attacks and deployment environments, and there are two main challenges: modality unreliability and modality imbalance. 1. Modality Unreliability: In different environments, such as depth and infrared sensors, significant domain shifts may occur, resulting in unreliable extracted features that affect cross-modal fusion. 2. Modality Imbalance: Training overly relies on the dominant modality, hindering the convergence of other modalities and reducing resistance to attack types that are difficult to differentiate using only the dominant modality. To address these problems, the paper proposes a framework called Multi-Modal Domain Generalized (MMDG), which includes two key components: 1. Uncertainty-guided Cross-Adapter (U-Adapter): Utilizes the uncertainty of each modality to identify and suppress the influence of unreliable regions, preventing the propagation of unreliable information across modalities. 2. Rebalancing Modality Gradient (ReGrad) Strategy: Dynamically adjusts the gradients of all modalities to balance their convergence speed, ensuring that all modalities are fully utilized to resist various unseen attacks in the target domain. Furthermore, the paper creates the first large-scale benchmark to evaluate the performance of multi-modal FAS in domain-generalization scenarios. The experiments demonstrate that the proposed method outperforms the existing state-of-the-art approaches, and the code and protocol will be released on GitHub.