Robust face anti-spoofing framework with Convolutional Vision Transformer

Yunseung Lee,Youngjun Kwak,Jinho Shin

2023-07-24

Abstract:Owing to the advances in image processing technology and large-scale datasets, companies have implemented facial authentication processes, thereby stimulating increased focus on face anti-spoofing (FAS) against realistic presentation attacks. Recently, various attempts have been made to improve face recognition performance using both global and local learning on face images; however, to the best of our knowledge, this is the first study to investigate whether the robustness of FAS against domain shifts is improved by considering global information and local cues in face images captured using self-attention and convolutional layers. This study proposes a convolutional vision transformer-based framework that achieves robust performance for various unseen domain data. Our model resulted in 7.3%$p$ and 12.9%$p$ increases in FAS performance compared to models using only a convolutional neural network or vision transformer, respectively. It also shows the highest average rank in sub-protocols of cross-dataset setting over the other nine benchmark models for domain generalization.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the robustness issue against domain shifts in the Face Anti-Spoofing (FAS) task. Specifically, the paper proposes a framework based on the Convolutional Vision Transformer (ConViT) to extract local and global information from images and capture this information through self-attention mechanisms and convolutional layers. This approach aims to improve the model's generalization ability across different datasets, especially when dealing with unseen attack types. The main contributions of this study are: 1. **Proposing a new framework**: Combining the advantages of self-attention mechanisms and convolution operations, utilizing ConViT to extract image features, thereby achieving better generalization performance. 2. **Improving the label discretization method**: Transforming the binary classification problem into a regression problem, generating discretized pseudo-labels through the CutMix technique, addressing the issue of overfitting in traditional binary classification methods. 3. **Excellent experimental results**: On multiple benchmark datasets, the proposed ConViT framework significantly outperforms methods that use only convolutional neural networks or pure vision transformers, showing the best performance in domain generalization. In summary, this paper aims to develop an FAS model that can effectively handle domain shift issues, enhancing the model's robustness and generalization ability by integrating local and global information.

Robust face anti-spoofing framework with Convolutional Vision Transformer

Robust Face Recognition by Fusion Local Singular Value Feature and Deformable Model

Selective Domain-Invariant Feature Alignment Network for Face Anti-Spoofing.

A Cascade Face Spoofing Detector Based on Face Anti-Spoofing R-CNN and Improved Retinex LBP

Self-Attention and MLP Auxiliary Convolution for Face Anti-Spoofing

Face anti-spoofing with cross-stage relation enhancement and spoof material perception

Face Anti-Spoofing with Human Material Perception

Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing

Deep Learning for Face Anti-Spoofing: A Survey

Multi-modal Face Anti-spoofing Using Multi-fusion Network and Global Depth-wise Convolution

S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens

AdvFAS: A robust face anti-spoofing framework against adversarial examples

Dual-Cross Central Difference Network for Face Anti-Spoofing.

Searching Central Difference Convolutional Networks for Face Anti-Spoofing

Learning Multi-Granularity Temporal Characteristics for Face Anti-Spoofing

Multi-modal Face Anti-spoofing Using Channel Cross Fusion Network and Global Depth-Wise Convolution.

Adaptive-avg-pooling based Attention Vision Transformer for Face Anti-spoofing

FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

CA-MoEiT: Generalizable Face Anti-spoofing via Dual Cross-Attention and Semi-fixed Mixture-of-Expert