CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing

Ajian Liu,Shuai Xue,Jianwen Gan,Jun Wan,Yanyan Liang,Jiankang Deng,Sergio Escalera,Zhen Lei
2024-03-21
Abstract:Domain generalization (DG) based Face Anti-Spoofing (FAS) aims to improve the model's performance on unseen domains. Existing methods either rely on domain labels to align domain-invariant feature spaces, or disentangle generalizable features from the whole sample, which inevitably lead to the distortion of semantic feature structures and achieve limited generalization. In this work, we make use of large-scale VLMs like CLIP and leverage the textual feature to dynamically adjust the classifier's weights for exploring generalizable visual features. Specifically, we propose a novel Class Free Prompt Learning (CFPL) paradigm for DG FAS, which utilizes two lightweight transformers, namely Content Q-Former (CQF) and Style Q-Former (SQF), to learn the different semantic prompts conditioned on content and style features by using a set of learnable query vectors, respectively. Thus, the generalizable prompt can be learned by two improvements: (1) A Prompt-Text Matched (PTM) supervision is introduced to ensure CQF learns visual representation that is most informative of the content description. (2) A Diversified Style Prompt (DSP) technology is proposed to diversify the learning of style prompts by mixing feature statistics between instance-specific styles. Finally, the learned text features modulate visual features to generalization through the designed Prompt Modulation (PM). Extensive experiments show that the CFPL is effective and outperforms the state-of-the-art methods on several cross-domain datasets.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the issue of Face Anti-Spoofing (FAS) in cross-domain scenarios. Specifically, existing methods typically rely on domain labels to align invariant feature spaces or to extract generalizable features from samples when dealing with distribution differences between different domains. These methods inevitably lead to the distortion of semantic feature structures and have limited generalization capabilities. The paper proposes a new method—Class Free Prompt Learning (CFPL), which leverages the text features of large-scale vision-language models (such as CLIP) to dynamically adjust classifier weights in order to explore generalizable visual features. The specific approach is as follows: 1. **Content and Style Prompt Learning**: Learning different semantic prompts through two lightweight Transformers (Content Q-Former and Style Q-Former). 2. **Prompt-Text Matching Supervision**: Ensuring that content prompts can extract the most relevant visual representations to the content description. 3. **Diversified Style Prompt Techniques**: Diversifying style prompts by mixing instance-specific style feature statistics. 4. **Prompt Modulation Function**: Modulating visual features with learned text features through a designed modulation function to achieve generalization. Through the above methods, the paper demonstrates the effectiveness of CFPL on multiple cross-domain datasets and significantly outperforms existing state-of-the-art methods on several metrics.