DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models

Jiachen Zhou,Peizhuo Lv,Yibing Lan,Guozhu Meng,Kai Chen,Hualong Ma
2023-12-20
Abstract:Dataset sanitization is a widely adopted proactive defense against poisoning-based backdoor attacks, aimed at filtering out and removing poisoned samples from training datasets. However, existing methods have shown limited efficacy in countering the ever-evolving trigger functions, and often leading to considerable degradation of benign accuracy. In this paper, we propose DataElixir, a novel sanitization approach tailored to purify poisoned datasets. We leverage diffusion models to eliminate trigger features and restore benign features, thereby turning the poisoned samples into benign ones. Specifically, with multiple iterations of the forward and reverse process, we extract intermediary images and their predicted labels for each sample in the original dataset. Then, we identify anomalous samples in terms of the presence of label transition of the intermediary images, detect the target label by quantifying distribution discrepancy, select their purified images considering pixel and feature distance, and determine their ground-truth labels by training a benign model. Experiments conducted on 9 popular attacks demonstrates that DataElixir effectively mitigates various complex attacks while exerting minimal impact on benign accuracy, surpassing the performance of baseline defense methods.
Cryptography and Security,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to effectively purify the data set contaminated by backdoor attacks, so as to mitigate the impact of backdoor attacks while maintaining the accuracy of the model on benign data. Specifically, the paper proposes a new data set purification method **DATAELIXIR**, which uses diffusion models (Diffusion Models) to eliminate trigger features and restore benign features, thereby converting contaminated samples into benign samples. ### Main problems and challenges 1. **Limitations of existing methods**: - Existing data set purification methods have limited effectiveness in dealing with evolving trigger functions. - These methods often lead to a significant decrease in benign accuracy. - For example, methods based on model activation, spectral signatures, and loss values perform poorly on specific types of trigger functions. 2. **Characteristics of backdoor attacks**: - Backdoor attacks inject malicious samples with trigger features into the training data, causing the victim model to make incorrect predictions when encountering these trigger features. - Contaminated samples will cause the model to over - fit to the trigger features during the training process, resulting in the model performing normally on benign inputs but making incorrect predictions on contaminated inputs. ### Solutions The main contributions of **DATAELIXIR** are: 1. **Purification using diffusion models**: - Introduce noise through the forward process of the diffusion model to eliminate trigger features. - Restore the benign features of the image through the reverse process, enabling the model to be re - classified into the correct label. - Specific steps include multiple iterations of forward and reverse processes, extracting intermediate images and their predicted labels, identifying abnormal samples, detecting target labels, and selecting purified images. 2. **Efficient detection and purification**: - **Candidate set construction**: Perform multiple rounds of forward and reverse processes on each sample, extract intermediate images, and construct a candidate set. - **Abnormal sample identification**: Identify benign, contaminated, and suspicious samples by analyzing the changes in labels in the candidate set. - **Target label detection**: Detect the target label by quantifying the distribution difference. - **Purified data set generation**: Select the purified images and determine their true labels to generate a purified data set. ### Experimental results - **Performance comparison**: Compared with four baseline defense methods (AC, Spectral, ABL, DBD), **DATAELIXIR** performs excellently in 9 popular backdoor attacks, with the highest true positive rate (TPR), the lowest false positive rate (FPR), the highest accuracy rate (ACC), and the lowest attack success rate (ASR). - **Target label detection**: In terms of detecting target labels, **DATAELIXIR** performs excellently and can accurately detect target labels and avoid false positives for benign labels. ### Summary **DATAELIXIR** effectively purifies the data set contaminated by backdoor attacks by using the forward and reverse processes of the diffusion model, which not only improves the accuracy of the model on benign data but also significantly reduces the success rate of backdoor attacks. This method provides a new and effective solution for defending against backdoor attacks.