Like Teacher, Like Pupil: Transferring Backdoors Via Feature-Based Knowledge Distillation

Jinyin Chen,Zhiqi Cao,Ruoxi Chen,Haibin Zheng,Xiao Li,Qi Xuan,Xing Yang
DOI: https://doi.org/10.1016/j.cose.2024.104041
IF: 5.105
2024-01-01
Computers & Security
Abstract:With the widespread adoption of edge computing, compressing deep neural networks (DNNs) via knowledge distillation (KD) has emerged as a popular technique for resource-limited scenarios. Among various KD methods, feature-based KD, which leverages the feature representations from intermediate layers of the teacher model to supervise the training of the student model, has shown superior performance and enjoyed wide application. However, users often overlook potential backdoor threats when using knowledge distillation (KD) to extract knowledge. To address the issue, this paper mainly contributes to three aspects: (1) we try the first step of exploring the security risks in feature-based KD, where implanted backdoors in teacher models can survive and transfer to student models. (2) We propose a backdoor attack method targeting feature distillation, achieved by encoding backdoor knowledge into specific neuron activation layers. Specifically, we optimize triggers to induce consistent feature map values in the teacher model and transfer the backdoor knowledge to student models through these features. We also design an adaptive defense method against this attack. (3) Extensive experiments on four common datasets and six sets of different teacher and student models validate that our attack outperforms the state-of-the-art (SOTA) baselines, with an average attack success rate of (∼×1.5). Additionally, we discuss effective defense methods against such backdoor attacks.
What problem does this paper attempt to address?