ARTEMIS: Defending Against Backdoor Attacks Via Distribution Shift

Meng Xue,Zhixian Wang,Qian Zhang,Xueluan Gong,Zhihang Liu,Yanjiao Chen
DOI: https://doi.org/10.1109/tdsc.2024.3472117
2024-01-01
IEEE Transactions on Dependable and Secure Computing
Abstract:Backdoor attacks can exploit vulnerabilities in the training process of Deep Neural Networks (DNNs), introducing hidden malicious functionality that can be activated by a specific input pattern. Existing defenses typically rely on the assumption of a significant difference between poisoned and clean samples. However, subtle differences in dynamic and low poisoning ratio attacks can conceal poisoning features, thereby evading defenses. In this work, we propose a novel backdoor defense approach called ARTEMIS, which utilizes distribution shifts to eliminate the discrepancy between poisoned and benign samples in the feature space. Additionally, ARTEMIS learns from domain-invariant features after the shift. To further enhance the purification of backdoors in DNNs, we incorporate soft knowledge distillation into ARTEMIS to guide the alignment of features between the source dataset domain and the generated distribution-shift dataset domain. We extensively compare our proposed method with 5 state-of-the-art (SOTA) defensive techniques under 9 SOTA attacks on 4 datasets to demonstrate its effectiveness and robustness. Our results show that our method, ARTEMIS, can successfully purify dynamic and low poisoning ratio backdoor attacks and outperform existing defenses by a significant margin. We will release our code as open-source upon publication.
What problem does this paper attempt to address?