FaceCat: Enhancing Face Recognition Security with a Unified Diffusion Model

Jiawei Chen,Xiao Yang,Yinpeng Dong,Hang Su,Zhaoxia Yin
2024-08-27
Abstract:Face anti-spoofing (FAS) and adversarial detection (FAD) have been regarded as critical technologies to ensure the safety of face recognition systems. However, due to limited practicality, complex deployment, and the additional computational overhead, it is necessary to implement both detection techniques within a unified framework. This paper aims to achieve this goal by breaking through two primary obstacles: 1) the suboptimal face feature representation and 2) the scarcity of training data. To address the limited performance caused by existing feature representations, motivated by the rich structural and detailed features of face diffusion models, we propose FaceCat, the first approach leveraging the diffusion model to simultaneously enhance the performance of FAS and FAD. Specifically, FaceCat elaborately designs a hierarchical fusion mechanism to capture rich face semantic features of the diffusion model. These features then serve as a robust foundation for a lightweight head, designed to execute FAS and FAD simultaneously. Due to the limitations in feature representation that arise from relying solely on single-modality image data, we further propose a novel text-guided multi-modal alignment strategy that utilizes text prompts to enrich feature representation, thereby enhancing performance. To combat data scarcity, we build a comprehensive dataset with a wide range of 28 attack types, offering greater potential for a unified framework in facial security. Extensive experiments validate the effectiveness of FaceCat generalizes significantly better and obtains excellent robustness against common input transformations.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address two key security issues in facial recognition systems: Face Anti-Spoofing (FAS) and Face Adversarial Detection (FAD). Specifically, the paper seeks to overcome the limitations of existing methods through the following points: 1. **Unified Framework**: Currently, FAS and FAD are typically treated as independent tasks, requiring the deployment of multiple models, which increases computational overhead. The paper proposes a unified framework that can perform both FAS and FAD tasks within a single model. 2. **Insufficient Feature Representation**: Traditional methods for handling FAS and FAD often rely on classification models that mainly focus on the structural features of images, neglecting the rich facial feature representations. The paper utilizes a Diffusion Model to extract rich facial features and designs a hierarchical fusion mechanism to capture these features. 3. **Data Scarcity**: Acquiring facial training data that covers various types of attacks is challenging. The paper constructs a comprehensive dataset, FaceCatData, which includes multiple types of attacks to address the issue of data insufficiency. Through these improvements, the paper proposes the FaceCat framework, which can simultaneously achieve FAS and FAD tasks within a unified model and demonstrates excellent robustness under various input transformations.