CBPF: Filtering Poisoned Data Based on Composite Backdoor Attack

Hanfeng Xia,Haibo Hong,Ruili Wang
2024-06-23
Abstract:Backdoor attacks involve the injection of a limited quantity of poisoned examples containing triggers into the training dataset. During the inference stage, backdoor attacks can uphold a high level of accuracy for normal examples, yet when presented with trigger-containing instances, the model may erroneously predict them as the targeted class designated by the attacker. This paper explores strategies for mitigating the risks associated with backdoor attacks by examining the filtration of poisoned samples.We primarily leverage two key characteristics of backdoor attacks: the ability for multiple backdoors to exist simultaneously within a single model, and the discovery through Composite Backdoor Attack (CBA) that altering two triggers in a sample to new target labels does not compromise the original functionality of the triggers, yet enables the prediction of the data as a new target class when both triggers are present simultaneously.Therefore, a novel three-stage poisoning data filtering approach, known as Composite Backdoor Poison Filtering (CBPF), is proposed as an effective solution. Firstly, utilizing the identified distinctions in output between poisoned and clean samples, a subset of data is partitioned to include both poisoned and clean instances. Subsequently, benign triggers are incorporated and labels are adjusted to create new target and benign target classes, thereby prompting the poisoned and clean data to be classified as distinct entities during the inference stage. The experimental results indicate that CBPF is successful in filtering out malicious data produced by six advanced attacks on CIFAR10 and ImageNet-12. On average, CBPF attains a notable filtering success rate of 99.91% for the six attacks on CIFAR10. Additionally, the model trained on the uncontaminated samples exhibits sustained high accuracy levels.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **the threat of backdoor attacks to deep - learning models**. Specifically, backdoor attacks inject a small number of malicious samples containing triggers into the training dataset, so that the model maintains a high accuracy rate for normal samples during the inference stage while misclassifying samples containing specific triggers and predicting them as the target category. This type of attack not only undermines the credibility of the model but can also be used for malicious purposes. To meet this challenge, the paper proposes a new method named **Composite Backdoor Poison Filtering (CBPF)**. CBPF aims to effectively filter out malicious samples in the training set through the following steps: 1. **Utilize the characteristics of Composite Backdoor Attack (CBA)**: CBA allows multiple backdoor triggers to exist simultaneously in one sample, and these triggers can point to different target categories. In this way, CBPF can distinguish poisoned samples from clean samples. 2. **Three - stage poisoned data filtering process**: - **First stage**: By calculating the difference in output between poisoned samples and clean samples (using the ΔTop2diff metric), preliminarily screen out a part of poisoned samples and clean samples. - **Second stage**: Add benign triggers to the screened - out dataset and adjust the labels to create new target categories and benign target categories. This step makes poisoned samples and clean samples be classified as different entities during the inference stage. - **Third stage**: Retrain the model with the dataset after adding benign triggers to further separate poisoned samples and clean samples. 3. **Experimental verification**: The paper conducted experiments on two datasets, CIFAR10 and ImageNet - 12, and tested the defense effect of CBPF against six different backdoor attacks. The experimental results show that CBPF performs excellently in filtering poisoned data, with an average filtering success rate of 99.91%, and maintains a high accuracy rate in models trained with uncontaminated samples. In conclusion, the main contribution of this paper lies in proposing an effective composite backdoor poisoning filtering method that can accurately filter out poisoned samples without relying on additional clean data, thereby protecting deep - learning models from the influence of backdoor attacks.