Abstract:Dataset sanitization is a widely adopted proactive defense against poisoning-based backdoor attacks, aimed at filtering out and removing poisoned samples from training datasets. However, existing methods have shown limited efficacy in countering the ever-evolving trigger functions, and often leading to considerable degradation of benign accuracy. In this paper, we propose DataElixir, a novel sanitization approach tailored to purify poisoned datasets. We leverage diffusion models to eliminate trigger features and restore benign features, thereby turning the poisoned samples into benign ones. Specifically, with multiple iterations of the forward and reverse process, we extract intermediary images and their predicted labels for each sample in the original dataset. Then, we identify anomalous samples in terms of the presence of label transition of the intermediary images, detect the target label by quantifying distribution discrepancy, select their purified images considering pixel and feature distance, and determine their ground-truth labels by training a benign model. Experiments conducted on 9 popular attacks demonstrates that DataElixir effectively mitigates various complex attacks while exerting minimal impact on benign accuracy, surpassing the performance of baseline defense methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to effectively purify the data set contaminated by backdoor attacks, so as to mitigate the impact of backdoor attacks while maintaining the accuracy of the model on benign data. Specifically, the paper proposes a new data set purification method **DATAELIXIR**, which uses diffusion models (Diffusion Models) to eliminate trigger features and restore benign features, thereby converting contaminated samples into benign samples. ### Main problems and challenges 1. **Limitations of existing methods**: - Existing data set purification methods have limited effectiveness in dealing with evolving trigger functions. - These methods often lead to a significant decrease in benign accuracy. - For example, methods based on model activation, spectral signatures, and loss values perform poorly on specific types of trigger functions. 2. **Characteristics of backdoor attacks**: - Backdoor attacks inject malicious samples with trigger features into the training data, causing the victim model to make incorrect predictions when encountering these trigger features. - Contaminated samples will cause the model to over - fit to the trigger features during the training process, resulting in the model performing normally on benign inputs but making incorrect predictions on contaminated inputs. ### Solutions The main contributions of **DATAELIXIR** are: 1. **Purification using diffusion models**: - Introduce noise through the forward process of the diffusion model to eliminate trigger features. - Restore the benign features of the image through the reverse process, enabling the model to be re - classified into the correct label. - Specific steps include multiple iterations of forward and reverse processes, extracting intermediate images and their predicted labels, identifying abnormal samples, detecting target labels, and selecting purified images. 2. **Efficient detection and purification**: - **Candidate set construction**: Perform multiple rounds of forward and reverse processes on each sample, extract intermediate images, and construct a candidate set. - **Abnormal sample identification**: Identify benign, contaminated, and suspicious samples by analyzing the changes in labels in the candidate set. - **Target label detection**: Detect the target label by quantifying the distribution difference. - **Purified data set generation**: Select the purified images and determine their true labels to generate a purified data set. ### Experimental results - **Performance comparison**: Compared with four baseline defense methods (AC, Spectral, ABL, DBD), **DATAELIXIR** performs excellently in 9 popular backdoor attacks, with the highest true positive rate (TPR), the lowest false positive rate (FPR), the highest accuracy rate (ACC), and the lowest attack success rate (ASR). - **Target label detection**: In terms of detecting target labels, **DATAELIXIR** performs excellently and can accurately detect target labels and avoid false positives for benign labels. ### Summary **DATAELIXIR** effectively purifies the data set contaminated by backdoor attacks by using the forward and reverse processes of the diffusion model, which not only improves the accuracy of the model on benign data but also significantly reduces the success rate of backdoor attacks. This method provides a new and effective solution for defending against backdoor attacks.

DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models

ARTEMIS: Defending Against Backdoor Attacks Via Distribution Shift

Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only

Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff

Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples

Mellivora Capensis: A Backdoor-Free Training Framework on the Poisoned Dataset without Auxiliary Data

ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

Stronger Data Poisoning Attacks Break Data Sanitization Defenses

Poison as a Cure: Detecting & Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

DP2Dataset Protection by Data Poisoning

Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models

Progressive Poisoned Data Isolation for Training-time Backdoor Defense

Towards A Proactive ML Approach for Detecting Backdoor Poison Samples

Poison-Resilient Anomaly Detection: Mitigating Poisoning Attacks in Semi-Supervised Encrypted Traffic Anomaly Detection

Explore the Effect of Data Selection on Poison Efficiency in Backdoor Attacks

From Trojan Horses to Castle Walls: Unveiling Bilateral Data Poisoning Effects in Diffusion Models

Poisoning Attacks and Data Sanitization Mitigations for Machine Learning Models in Network Intrusion Detection Systems

Poisoning Web-Scale Training Datasets is Practical

Backdoor Defense via Adaptively Splitting Poisoned Dataset

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

How to Sift Out a Clean Data Subset in the Presence of Data Poisoning?