Abstract:Single-shot face anti-spoofing (FAS) is a key technique for securing face recognition systems, relying solely on static images as input. However, single-shot FAS remains a challenging and under-explored problem due to two reasons: 1) On the data side, learning FAS from RGB images is largely context-dependent, and single-shot images without additional annotations contain limited semantic information. 2) On the model side, existing single-shot FAS models struggle to provide proper evidence for their decisions, and FAS methods based on depth estimation require expensive per-pixel annotations. To address these issues, we construct and release a large binocular NIR image dataset named BNI-FAS, which contains more than 300,000 real face and plane attack images, and propose an Interpretable FAS Transformer (IFAST) that requires only weak supervision to produce interpretable predictions. Our IFAST generates pixel-wise disparity maps using the proposed disparity estimation Transformer with Dynamic Matching Attention (DMA) blocks. Besides, we design a confidence map generator to work in tandem with a dual-teacher distillation module to obtain the final discriminant results. Comprehensive experiments show that our IFAST achieves state-of-the-art performance on BNI-FAS, verifying its effectiveness of single-shot FAS on binocular NIR images. The project page is available at https://ifast-bni.github.io/.

IFAST: Weakly Supervised Interpretable Face Anti-Spoofing From Single-Shot Binocular NIR Images