Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense

Qi Zhou,Zipeng Ye,Yubo Tang,Wenjian Luo,Yuhui Shi,Yan Jia
2024-07-14
Abstract:Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition. However, DNN model is fragile to backdoor attack. A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction, which causes serious security issues in applications. It is challenging for current defenses to eliminate the backdoor effectively with limited computing resources, especially when the sizes and numbers of the triggers are variable as in the physical world. We propose an efficient backdoor defense based on evolutionary trigger detection and lightweight model repair. In the first phase of our method, CAM-focus Evolutionary Trigger Filter (CETF) is proposed for trigger detection. CETF is an effective sample-preprocessing based method with the evolutionary algorithm, and our experimental results show that CETF not only distinguishes the images with triggers accurately from the clean images, but also can be widely used in practice for its simplicity and stability in different backdoor attack situations. In the second phase of our method, we leverage several lightweight unlearning methods with the trigger detected by CETF for model repair, which also constructively demonstrate the underlying correlation of the backdoor with Batch Normalization layers. Source code will be published after accepted.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the backdoor attack problem in deep neural networks (DNNs). Specifically, backdoor attacks add specific triggers to the input data, causing DNN models to make incorrect predictions when encountering these triggers, thus leading to serious security issues. Existing defense methods have limitations in eliminating backdoor attacks, especially in cases where computing resources are limited, and it is difficult to effectively deal with changes in the size and number of triggers. To address these issues, the authors propose an effective backdoor defense method based on evolutionary trigger detection and lightweight model repair. This method is divided into two stages: 1. **Evolutionary Trigger Detection**: - Use CAM (Class Activation Mapping) - focused Evolutionary Trigger Filter (CETF) for trigger detection. CETF combines sample pre - processing and evolutionary algorithms, and can accurately distinguish between images with triggers and clean images, and shows simplicity and stability in different backdoor attack situations. - Specific steps include: first, use GradCAM to generate saliency maps to determine the approximate location of the affected area; then use the Differential Evolution (DE) algorithm to accurately find the area with the greatest impact. By pasting the optimized area on a set of clean images and checking the change in prediction results to determine whether a trigger exists. 2. **Lightweight Model Repair**: - Use the triggers detected by CETF to repair the model. Repair the backdoor - attacked model through several lightweight "forgetting" methods, such as fine - tuning based on reverse triggers. - Further discover that the backdoor is hidden in the Batch Normalization (BN) layer, and propose two more efficient model repair methods, BN - unlearning and BN - cleaning. These methods can achieve excellent results in a large number of experiments by only modifying the parameters of the BN layer. The main contributions of the paper include: - For the first time, apply evolutionary algorithms to backdoor defense. CETF can accurately search for triggers in poisoned inputs and has high robustness. - The method based on extracting triggers and model repair shows superior defense effects, and proposes two lightweight model repair methods, improving efficiency. - Explore the "shortcut" of the backdoor in the BN layer, and verify this finding through a large number of experiments, which is of great significance for future AI backdoor research and model interpretability.