Abstract:Backdoor attacks involve the injection of a limited quantity of poisoned examples containing triggers into the training dataset. During the inference stage, backdoor attacks can uphold a high level of accuracy for normal examples, yet when presented with trigger-containing instances, the model may erroneously predict them as the targeted class designated by the attacker. This paper explores strategies for mitigating the risks associated with backdoor attacks by examining the filtration of poisoned samples.We primarily leverage two key characteristics of backdoor attacks: the ability for multiple backdoors to exist simultaneously within a single model, and the discovery through Composite Backdoor Attack (CBA) that altering two triggers in a sample to new target labels does not compromise the original functionality of the triggers, yet enables the prediction of the data as a new target class when both triggers are present simultaneously.Therefore, a novel three-stage poisoning data filtering approach, known as Composite Backdoor Poison Filtering (CBPF), is proposed as an effective solution. Firstly, utilizing the identified distinctions in output between poisoned and clean samples, a subset of data is partitioned to include both poisoned and clean instances. Subsequently, benign triggers are incorporated and labels are adjusted to create new target and benign target classes, thereby prompting the poisoned and clean data to be classified as distinct entities during the inference stage. The experimental results indicate that CBPF is successful in filtering out malicious data produced by six advanced attacks on CIFAR10 and ImageNet-12. On average, CBPF attains a notable filtering success rate of 99.91% for the six attacks on CIFAR10. Additionally, the model trained on the uncontaminated samples exhibits sustained high accuracy levels.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is **the threat of backdoor attacks to deep - learning models**. Specifically, backdoor attacks inject a small number of malicious samples containing triggers into the training dataset, so that the model maintains a high accuracy rate for normal samples during the inference stage while misclassifying samples containing specific triggers and predicting them as the target category. This type of attack not only undermines the credibility of the model but can also be used for malicious purposes. To meet this challenge, the paper proposes a new method named **Composite Backdoor Poison Filtering (CBPF)**. CBPF aims to effectively filter out malicious samples in the training set through the following steps: 1. **Utilize the characteristics of Composite Backdoor Attack (CBA)**: CBA allows multiple backdoor triggers to exist simultaneously in one sample, and these triggers can point to different target categories. In this way, CBPF can distinguish poisoned samples from clean samples. 2. **Three - stage poisoned data filtering process**: - **First stage**: By calculating the difference in output between poisoned samples and clean samples (using the ΔTop2diff metric), preliminarily screen out a part of poisoned samples and clean samples. - **Second stage**: Add benign triggers to the screened - out dataset and adjust the labels to create new target categories and benign target categories. This step makes poisoned samples and clean samples be classified as different entities during the inference stage. - **Third stage**: Retrain the model with the dataset after adding benign triggers to further separate poisoned samples and clean samples. 3. **Experimental verification**: The paper conducted experiments on two datasets, CIFAR10 and ImageNet - 12, and tested the defense effect of CBPF against six different backdoor attacks. The experimental results show that CBPF performs excellently in filtering poisoned data, with an average filtering success rate of 99.91%, and maintains a high accuracy rate in models trained with uncontaminated samples. In conclusion, the main contribution of this paper lies in proposing an effective composite backdoor poisoning filtering method that can accurately filter out poisoned samples without relying on additional clean data, thereby protecting deep - learning models from the influence of backdoor attacks.

CBPF: Filtering Poisoned Data Based on Composite Backdoor Attack

Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only

Explore the Effect of Data Selection on Poison Efficiency in Backdoor Attacks

Poison as a Cure: Detecting & Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples

Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

Progressive Poisoned Data Isolation for Training-time Backdoor Defense

Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers

Systematic Evaluation of Backdoor Data Poisoning Attacks on Image Classifiers

UltraClean: A Simple Framework to Train Robust Neural Networks against Backdoor Attacks

Towards A Proactive ML Approach for Detecting Backdoor Poison Samples

Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy

Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features

The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data

Protecting against simultaneous data poisoning attacks

Universal Backdoor Attacks

PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning

Backdoor Defense Via Deconfounded Representation Learning

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Efficient Any-Target Backdoor Attack with Pseudo Poisoned Samples