Efficient Face Anti-Spoofing Via Head-Aware Transformer Based Knowledge Distillation with 5 MB Model Parameters

Jun Zhang,Yunfei Zhang,Feixue Shao,Xuetao Ma,Shu Feng,Yongfei Wu,Daoxiang Zhou
DOI: https://doi.org/10.1016/j.asoc.2024.112237
IF: 8.7
2024-01-01
Applied Soft Computing
Abstract:Although face recognition technology has been applied in many scenarios, it still suffers from many types of presentation attacks, so face anti-spoofing (FAS) becomes a hot topic in computer vision. Recently, vision transformer is recognized as the mainstream architecture for FAS, which always relies on auxiliary information, sophisticated tricks and huge model parameters. Considering that face based identity authentication usually takes place on mobile-like devices, therefore how to design an effective and lightweight model is of great significance. Inspired by the powerful global modeling ability of self-attention and the model compression ability of knowledge distillation, a simple yet effective knowledge distillation approach is proposed for FAS under transformer framework. Our primary idea is to leverage the rich knowledge of a teacher network pre- trained on large-scale face data to guide the learning of a lightweight student network. The main contributions of our method are threefold: (1) Feature- and logits-level distillation are combined to transfer the rich knowledge of teacher to student. (2) A head-aware strategy is proposed to deal with the dimension mismatching issue of middle encoder layers between teacher and student networks, in which a novel attention head correlation matrix is introduced. (3) Our method can bridge the performance gap between teacher and student, and the resulting student network is extremely lightweight with only 5 MB parameters. Extensive experiments are conducted on three public face-spoofing datasets, CASIA-FASD, Replay-Attack and OULU-NPU, the results demonstrate that our method can obtain performance on par with or superior to most FAS methods and outperform many knowledge distillation methods. Meanwhile, the distilled student network achieves excellent performance with 17x x fewer parameters and 9x x faster inference time compared to the teacher network. The code will be publicly available at https://github.com/Maricle-zhangjun/HaTFAS.
What problem does this paper attempt to address?