DST-FRD: A Distillation Method of Swin Transformer for Facial Reenactment Detection

Haotian Wu,Yu Chen,Xin Wang,Lin Wang,Ji Xiang,Liyue Ren
DOI: https://doi.org/10.1109/cscwd61410.2024.10580515
2024-01-01
Abstract:In recent times, transformer-based deepfake detection networks have exhibited remarkable performance. However, the computational complexity and the number of parameters have constrained the practical application of these networks. To address these issues, we propose a knowledge distillation method for the Swin Transformer network. Specifically, this method utilizes the region prediction results of face images to distill the knowledge of the Swin Transformer in subregions, compensating for the deficiency of the small window size of the Swin Transformer in the early stage. Extensive experiments have demonstrated that our proposed distillation method not only reduces the parameters and computational effort of the model but also surpasses the teacher network in accuracy on low-resolution images. Our student network exhibits significantly lower computational complexity and fewer parameters than the teacher network, with reductions of only 17.96% and 20.44%, respectively. Despite this reduction in complexity and parameters, our student network has achieved state-of-the-art results on the FaceForensics++ dataset, surpassing the teacher network by 0.071%, 0.86%, and 8.339% on Raw/Raw, Raw/C23, and Raw/C40, respectively.
What problem does this paper attempt to address?