2D-Malafide: Adversarial Attacks Against Face Deepfake Detection Systems

Chiara Galdi,Michele Panariello,Massimiliano Todisco,Nicholas Evans
2024-08-26
Abstract:We introduce 2D-Malafide, a novel and lightweight adversarial attack designed to deceive face deepfake detection systems. Building upon the concept of 1D convolutional perturbations explored in the speech domain, our method leverages 2D convolutional filters to craft perturbations which significantly degrade the performance of state-of-the-art face deepfake detectors. Unlike traditional additive noise approaches, 2D-Malafide optimises a small number of filter coefficients to generate robust adversarial perturbations which are transferable across different face images. Experiments, conducted using the FaceForensics++ dataset, demonstrate that 2D-Malafide substantially degrades detection performance in both white-box and black-box settings, with larger filter sizes having the greatest impact. Additionally, we report an explainability analysis using GradCAM which illustrates how 2D-Malafide misleads detection systems by altering the image areas used most for classification. Our findings highlight the vulnerability of current deepfake detection systems to convolutional adversarial attacks as well as the need for future work to enhance detection robustness through improved image fidelity constraints.
Computer Vision and Pattern Recognition,Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
The paper aims to address the issue of adversarial attacks on face deepfake detection systems. Specifically, the paper proposes 2D-Malafide, a novel and lightweight adversarial attack method that generates adversarial perturbations using 2D convolutional filters, significantly reducing the performance of state-of-the-art face deepfake detection systems. Unlike traditional additive noise methods, 2D-Malafide generates robust and transferable adversarial perturbations by optimizing a small number of filter coefficients. Experimental results show that 2D-Malafide can significantly reduce detection performance in both white-box and black-box settings, with more pronounced effects at larger filter sizes. Additionally, the study employs GradCAM for interpretability analysis, demonstrating how 2D-Malafide misleads detection systems by altering key regions in the image used for classification. These findings highlight the vulnerability of existing deepfake detection systems to convolutional adversarial attacks and emphasize the need for future work to enhance detection robustness.