Mitigating Backdoor Attacks in Pre-Trained Encoders via Self-Supervised Knowledge Distillation

Yinbin Miao,Xiaohua Jia,Jinxiu Jiang,Hongcheng Xie,Yu Guo,Rongfang Bie
DOI: https://doi.org/10.1109/TSC.2024.3417279
IF: 11.019
2024-09-01
IEEE Transactions on Services Computing
Abstract:Pre-trained encoders in computer vision have recently received great attention from both research and industry communities. Among others, a promising paradigm is to utilize self-supervised learning (SSL) to train image encoders with massive unlabeled samples, thereby endowing encoders with the capability to embed abundant knowledge into the feature representations. Backdoor attacks on SSL disrupt the encoder's feature extraction capabilities, causing downstream classifiers to inherit backdoor behavior and leading to misclassification. Existing backdoor defense methods primarily focus on supervised learning scenarios and cannot be effectively migrated to SSL pre-trained encoders. In this article, we present a backdoor defense scheme based on self-supervised knowledge distillation. Our approach aims to eliminate backdoors while preserving the feature extraction capability using the downstream dataset. We incorporate the benefits of contrastive and non-contrastive SSL methods for knowledge distillation, ensuring differentiation between the representations of various classes and the consistency of representations within the same class. Consequently, the extraction capability of pre-trained encoders is preserved. Extensive experiments against multiple attacks demonstrate that the proposed scheme outperforms the state-of-the-art solutions.
Computer Science
What problem does this paper attempt to address?