Backdoor Defense via Adaptively Splitting Poisoned Dataset

Kuofeng Gao,Yang Bai,Jindong Gu,Yong Yang,Shu-Tao Xia
2023-03-23
Abstract:Backdoor defenses have been studied to alleviate the threat of deep neural networks (DNNs) being backdoor attacked and thus maliciously altered. Since DNNs usually adopt some external training data from an untrusted third party, a robust backdoor defense strategy during the training stage is of importance. We argue that the core of training-time defense is to select poisoned samples and to handle them properly. In this work, we summarize the training-time defenses from a unified framework as splitting the poisoned dataset into two data pools. Under our framework, we propose an adaptively splitting dataset-based defense (ASD). Concretely, we apply loss-guided split and meta-learning-inspired split to dynamically update two data pools. With the split clean data pool and polluted data pool, ASD successfully defends against backdoor attacks during training. Extensive experiments on multiple benchmark datasets and DNN models against six state-of-the-art backdoor attacks demonstrate the superiority of our ASD. Our code is available at <a class="link-external link-https" href="https://github.com/KuofengGao/ASD" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Cryptography and Security
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily aims to address the issue of backdoor attacks on deep neural networks (DNNs) during the training process and proposes a new defense method. Specifically: 1. **Core Problem**: - How to effectively select and handle contaminated data samples during the training phase. 2. **Research Background**: - Deep neural networks are susceptible to backdoor attacks, where a small number of contaminated samples are injected into the training dataset, causing the model to exhibit malicious behavior under specific trigger patterns. - External training data often comes from untrusted third parties, necessitating effective defense strategies during the training phase to counter these attacks. 3. **Proposed Solution**: - The paper proposes a method based on adaptively splitting the dataset (Adaptively Splitting Dataset-based Defense, ASD), dividing the contaminated dataset into two pools: a clean data pool (containing samples with trusted labels) and a contaminated data pool (containing contaminated samples and the remaining clean samples). - By dynamically updating these two pools and employing semi-supervised learning for training, the method effectively defends against backdoor attacks during the training process. 4. **Experimental Validation**: - The paper conducts extensive experimental validation using multiple benchmark datasets and state-of-the-art backdoor attack methods, demonstrating the effectiveness of the proposed method. In summary, this paper aims to provide a new method for effectively defending against backdoor attacks during the training process, ensuring the security and robustness of deep neural networks.