Towards Robust Object Detection: Identifying and Removing Backdoors via Module Inconsistency Analysis

Xianda Zhang,Siyuan Liang
2024-09-30
Abstract:Object detection models, widely used in security-critical applications, are vulnerable to backdoor attacks that cause targeted misclassifications when triggered by specific patterns. Existing backdoor defense techniques, primarily designed for simpler models like image classifiers, often fail to effectively detect and remove backdoors in object detectors. We propose a backdoor defense framework tailored to object detection models, based on the observation that backdoor attacks cause significant inconsistencies between local modules' behaviors, such as the Region Proposal Network (RPN) and classification head. By quantifying and analyzing these inconsistencies, we develop an algorithm to detect backdoors. We find that the inconsistent module is usually the main source of backdoor behavior, leading to a removal method that localizes the affected module, resets its parameters, and fine-tunes the model on a small clean dataset. Extensive experiments with state-of-the-art two-stage object detectors show our method achieves a 90% improvement in backdoor removal rate over fine-tuning baselines, while limiting clean data accuracy loss to less than 4%. To the best of our knowledge, this work presents the first approach that addresses both the detection and removal of backdoors in two-stage object detection models, advancing the field of securing these complex systems against backdoor attacks.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the backdoor attack problem in object detection models. Specifically, the paper focuses on how to identify and remove backdoors in these models to improve the security and reliability of the models in practical applications. Due to their complex architectures, object detection models are vulnerable to backdoor attacks. Such attacks cause the models to misclassify when receiving specific trigger patterns while performing normally under normal inputs. Existing backdoor defense techniques mainly target relatively simple models (such as image classifiers) and are ineffective in dealing with object detection models, making it difficult to effectively detect and remove backdoors while maintaining model performance. ### Main contributions of the paper: 1. **Explore the backdoor removal problem in object detection models for the first time**: The paper fills this gap in the field and aims to develop safer and more robust object detection models to resist backdoor attacks. 2. **Reveal the vulnerability of object detection models to backdoor attacks**: By analyzing the inconsistent behavior between different modules, an effective backdoor detection method is proposed. In particular, the inconsistency between the Region Proposal Network (RPN) and the Region - based Convolutional Neural Network (R - CNN) is utilized to detect backdoors. 3. **Propose a novel backdoor removal technique**: Combining local initialization and global fine - tuning methods, not only successfully eliminates the backdoor effect but also minimizes the impact on the model's performance on normal data. ### Key methods of the paper: 1. **Cross - module inconsistency detection**: - **Inconsistency score calculation**: For each trigger sample \(x\), extract its RPN output \(\{r_i\}_{i = 1}^N\) and R - CNN output \(\{(p_i, t_i)\}_{i = 1}^N\), where \(N\) is the number of proposals. Calculate the difference between the RPN classification score and the R - CNN classification score for each proposal as the inconsistency score \(s_i\): \[ s_i=\|r_i - p_i\|_1 \] - **Ignore difference threshold setting**: Introduce an ignore difference threshold \(\epsilon\), and only consider scores \(s_i\) greater than \(\epsilon\) as significant inconsistencies and collect them into the set \(S\) for further analysis. - **Arithmetic mean calculation**: Calculate the arithmetic mean \(\mu\) of the significant inconsistency scores in the set \(S\): \[ \mu=\frac{1}{|S|}\sum_{s\in S}s \] - **Backdoor judgment threshold selection**: Compare the arithmetic mean \(\mu\) with the predefined backdoor judgment threshold \(\theta\). If \(\mu\) is greater than \(\theta\), then the model is considered to be possibly under backdoor attack; otherwise, the model is considered normal. 2. **Target reset fine - tuning**: - **Identify affected modules**: Use the function `IdentifyAffectedModule(M)` to identify the most severely affected modules by the backdoor. This function uses the inconsistency scores calculated in the backdoor detection algorithm to determine the affected modules. - **Locally initialize affected parameters**: Re - initialize the parameters of the affected modules to eliminate backdoor - related information. - **Fine - tune the model on enhanced clean data**: Use a small - scale clean data set to fine - tune the entire model, so that the model adapts to parameter changes while maintaining performance on clean data. ### Experimental results: The paper conducted extensive experiments on multiple state - of - the - art object detection models, including Faster R - CNN, Faster R - CNN FPN, Mask R - CNN, and Double - Head R - CNN. The experimental results show that this method can successfully detect and remove backdoors, with the backdoor removal rate increased by about 90% compared to the baseline method, while the accuracy loss on clean data does not exceed 4%. In conclusion, this paper solves the backdoor attack problem in object detection models through innovative methods and provides new ideas for improving the security and reliability of models.