UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening

Siyuan Cheng,Guangyu Shen,Kaiyuan Zhang,Guanhong Tao,Shengwei An,Hanxi Guo,Shiqing Ma,Xiangyu Zhang
2024-07-16
Abstract:Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent advanced attacks. In this paper, we introduce a novel post-training defense technique UNIT that can effectively eliminate backdoor effects for a variety of attacks. In specific, UNIT approximates a unique and tight activation distribution for each neuron in the model. It then proactively dispels substantially large activation values that exceed the approximated boundaries. Our experimental results demonstrate that UNIT outperforms 7 popular defense methods against 14 existing backdoor attacks, including 2 advanced attacks, using only 5\% of clean training data. UNIT is also cost efficient. The code is accessible at <a class="link-external link-https" href="https://github.com/Megum1/UNIT" rel="external noopener nofollow">this https URL</a>.
Cryptography and Security,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of backdoor attacks in deep neural networks (DNNs). Specifically, backdoor attacks inject a specific pattern (called a trigger) into the training data, causing the model to misclassify as the target label when it encounters an input containing the trigger. Although previous studies have proposed various methods to mitigate backdoor attacks, these methods are not effective against the latest advanced attacks. To solve this problem, the authors propose a new post - training defense technique - **UNIT (Automated Neural Distribution Tightening)**. This method can effectively eliminate the effects of multiple backdoor attacks, especially against recently emerged advanced attacks. The following are the main goals and features of this method: 1. **Eliminate backdoor effects**: The goal of UNIT is to remove the backdoor effects in the model, so that the input with the inserted trigger cannot lead to the target prediction. 2. **Only a small amount of clean data is required**: UNIT assumes that a small amount (usually less than 5%) of clean training samples can be accessed, without the need to know the specific pattern of the trigger. 3. **Fine - grained adjustment**: Unlike existing coarse - grained repair methods, UNIT precisely eliminates malicious activation values by making fine - grained adjustments to the activation distribution of each neuron while maintaining benign functionality. 4. **Automated optimization**: UNIT uses optimization techniques to automatically adjust the boundaries of each neuron to ensure effective mitigation of backdoor attacks while maintaining high accuracy. ### Specific implementation - **Activation distribution approximation**: Based on a small number of clean samples, UNIT approximates the benign activation distribution of each neuron and sets a tight boundary. - **Clipping large activation values**: During the inference process, UNIT will clip the activation values outside the boundary to the approximated boundary, thereby suppressing the abnormal activation triggered by the backdoor. - **Optimization process**: Through the optimization process, UNIT dynamically adjusts the boundaries of each neuron to ensure that the benign accuracy does not decrease significantly while minimizing backdoor behavior. ### Experimental results The experimental results show that UNIT outperforms 7 popular defense methods when combating 14 existing backdoor attacks (including 2 advanced attacks) and only requires 5% of clean training data. In addition, UNIT is cost - effective and applicable to different datasets, network structures, and activation functions. In conclusion, this paper solves the problem that existing backdoor defense techniques are not effective against advanced attacks by proposing the UNIT method, providing an efficient and general - purpose solution.