Abstract:Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent advanced attacks. In this paper, we introduce a novel post-training defense technique UNIT that can effectively eliminate backdoor effects for a variety of attacks. In specific, UNIT approximates a unique and tight activation distribution for each neuron in the model. It then proactively dispels substantially large activation values that exceed the approximated boundaries. Our experimental results demonstrate that UNIT outperforms 7 popular defense methods against 14 existing backdoor attacks, including 2 advanced attacks, using only 5\% of clean training data. UNIT is also cost efficient. The code is accessible at <a class="link-external link-https" href="https://github.com/Megum1/UNIT" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of backdoor attacks in deep neural networks (DNNs). Specifically, backdoor attacks inject a specific pattern (called a trigger) into the training data, causing the model to misclassify as the target label when it encounters an input containing the trigger. Although previous studies have proposed various methods to mitigate backdoor attacks, these methods are not effective against the latest advanced attacks. To solve this problem, the authors propose a new post - training defense technique - **UNIT (Automated Neural Distribution Tightening)**. This method can effectively eliminate the effects of multiple backdoor attacks, especially against recently emerged advanced attacks. The following are the main goals and features of this method: 1. **Eliminate backdoor effects**: The goal of UNIT is to remove the backdoor effects in the model, so that the input with the inserted trigger cannot lead to the target prediction. 2. **Only a small amount of clean data is required**: UNIT assumes that a small amount (usually less than 5%) of clean training samples can be accessed, without the need to know the specific pattern of the trigger. 3. **Fine - grained adjustment**: Unlike existing coarse - grained repair methods, UNIT precisely eliminates malicious activation values by making fine - grained adjustments to the activation distribution of each neuron while maintaining benign functionality. 4. **Automated optimization**: UNIT uses optimization techniques to automatically adjust the boundaries of each neuron to ensure effective mitigation of backdoor attacks while maintaining high accuracy. ### Specific implementation - **Activation distribution approximation**: Based on a small number of clean samples, UNIT approximates the benign activation distribution of each neuron and sets a tight boundary. - **Clipping large activation values**: During the inference process, UNIT will clip the activation values outside the boundary to the approximated boundary, thereby suppressing the abnormal activation triggered by the backdoor. - **Optimization process**: Through the optimization process, UNIT dynamically adjusts the boundaries of each neuron to ensure that the benign accuracy does not decrease significantly while minimizing backdoor behavior. ### Experimental results The experimental results show that UNIT outperforms 7 popular defense methods when combating 14 existing backdoor attacks (including 2 advanced attacks) and only requires 5% of clean training data. In addition, UNIT is cost - effective and applicable to different datasets, network structures, and activation functions. In conclusion, this paper solves the problem that existing backdoor defense techniques are not effective against advanced attacks by proposing the UNIT method, providing an efficient and general - purpose solution.

UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening

ATTEQ-NN: Attention-based QoE-aware Evasive Backdoor Attacks.

Backdoor Mitigation by Correcting the Distribution of Neural Activations

Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

Beating Backdoor Attack at Its Own Game

A Practical Trigger-Free Backdoor Attack on Neural Networks

An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers

Backdoor Defense via Decoupling the Training Process

Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks

Enhanced Coalescence Backdoor Attack Against DNN Based on Pixel Gradient

Need for Speed: Taming Backdoor Attacks with Speed and Precision

Test-Time Backdoor Defense via Detecting and Repairing

Defense-Resistant Backdoor Attacks Against Deep Neural Networks in Outsourced Cloud Environment

PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning

Defending Against Backdoor Attacks by Quarantine Training

Detection of backdoor attacks using targeted universal adversarial perturbations for deep neural networks

Adaptive Backdoor Attack Against Deep Neural Networks

Backdoor Cleansing with Unlabeled Data

Mitigating Backdoor Attack Via Prerequisite Transformation

Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

Confidence Matters: Inspecting Backdoors in Deep Neural Networks via Distribution Transfer