Abstract:Object detection models, widely used in security-critical applications, are vulnerable to backdoor attacks that cause targeted misclassifications when triggered by specific patterns. Existing backdoor defense techniques, primarily designed for simpler models like image classifiers, often fail to effectively detect and remove backdoors in object detectors. We propose a backdoor defense framework tailored to object detection models, based on the observation that backdoor attacks cause significant inconsistencies between local modules' behaviors, such as the Region Proposal Network (RPN) and classification head. By quantifying and analyzing these inconsistencies, we develop an algorithm to detect backdoors. We find that the inconsistent module is usually the main source of backdoor behavior, leading to a removal method that localizes the affected module, resets its parameters, and fine-tunes the model on a small clean dataset. Extensive experiments with state-of-the-art two-stage object detectors show our method achieves a 90% improvement in backdoor removal rate over fine-tuning baselines, while limiting clean data accuracy loss to less than 4%. To the best of our knowledge, this work presents the first approach that addresses both the detection and removal of backdoors in two-stage object detection models, advancing the field of securing these complex systems against backdoor attacks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the backdoor attack problem in object detection models. Specifically, the paper focuses on how to identify and remove backdoors in these models to improve the security and reliability of the models in practical applications. Due to their complex architectures, object detection models are vulnerable to backdoor attacks. Such attacks cause the models to misclassify when receiving specific trigger patterns while performing normally under normal inputs. Existing backdoor defense techniques mainly target relatively simple models (such as image classifiers) and are ineffective in dealing with object detection models, making it difficult to effectively detect and remove backdoors while maintaining model performance. ### Main contributions of the paper: 1. **Explore the backdoor removal problem in object detection models for the first time**: The paper fills this gap in the field and aims to develop safer and more robust object detection models to resist backdoor attacks. 2. **Reveal the vulnerability of object detection models to backdoor attacks**: By analyzing the inconsistent behavior between different modules, an effective backdoor detection method is proposed. In particular, the inconsistency between the Region Proposal Network (RPN) and the Region - based Convolutional Neural Network (R - CNN) is utilized to detect backdoors. 3. **Propose a novel backdoor removal technique**: Combining local initialization and global fine - tuning methods, not only successfully eliminates the backdoor effect but also minimizes the impact on the model's performance on normal data. ### Key methods of the paper: 1. **Cross - module inconsistency detection**: - **Inconsistency score calculation**: For each trigger sample \(x\), extract its RPN output \(\{r_i\}_{i = 1}^N\) and R - CNN output \(\{(p_i, t_i)\}_{i = 1}^N\), where \(N\) is the number of proposals. Calculate the difference between the RPN classification score and the R - CNN classification score for each proposal as the inconsistency score \(s_i\): \[ s_i=\|r_i - p_i\|_1 \] - **Ignore difference threshold setting**: Introduce an ignore difference threshold \(\epsilon\), and only consider scores \(s_i\) greater than \(\epsilon\) as significant inconsistencies and collect them into the set \(S\) for further analysis. - **Arithmetic mean calculation**: Calculate the arithmetic mean \(\mu\) of the significant inconsistency scores in the set \(S\): \[ \mu=\frac{1}{|S|}\sum_{s\in S}s \] - **Backdoor judgment threshold selection**: Compare the arithmetic mean \(\mu\) with the predefined backdoor judgment threshold \(\theta\). If \(\mu\) is greater than \(\theta\), then the model is considered to be possibly under backdoor attack; otherwise, the model is considered normal. 2. **Target reset fine - tuning**: - **Identify affected modules**: Use the function `IdentifyAffectedModule(M)` to identify the most severely affected modules by the backdoor. This function uses the inconsistency scores calculated in the backdoor detection algorithm to determine the affected modules. - **Locally initialize affected parameters**: Re - initialize the parameters of the affected modules to eliminate backdoor - related information. - **Fine - tune the model on enhanced clean data**: Use a small - scale clean data set to fine - tune the entire model, so that the model adapts to parameter changes while maintaining performance on clean data. ### Experimental results: The paper conducted extensive experiments on multiple state - of - the - art object detection models, including Faster R - CNN, Faster R - CNN FPN, Mask R - CNN, and Double - Head R - CNN. The experimental results show that this method can successfully detect and remove backdoors, with the backdoor removal rate increased by about 90% compared to the baseline method, while the accuracy loss on clean data does not exceed 4%. In conclusion, this paper solves the backdoor attack problem in object detection models through innovative methods and provides new ideas for improving the security and reliability of models.

Towards Robust Object Detection: Identifying and Removing Backdoors via Module Inconsistency Analysis

BadDet: Backdoor Attacks on Object Detection

Untargeted Backdoor Attack against Object Detection

Attacking by Aligning: Clean-Label Backdoor Attacks on Object Detection

AnywhereDoor: Multi-Target Backdoor Attacks on Object Detection

Detector Collapse: Physical-World Backdooring Object Detection to Catastrophic Overload or Blindness in Autonomous Driving

On the Credibility of Backdoor Attacks Against Object Detectors in the Physical World

Mask-based Invisible Backdoor Attacks on Object Detection

What's Wrong with the Robustness of Object Detectors?

Dangerous Cloaking: Natural Trigger Based Backdoor Attacks on Object Detectors in the Physical World

Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

Understanding Object Detection Through An Adversarial Lens

Towards A Critical Evaluation of Robustness for Deep Learning Backdoor Countermeasures

Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack

Toward a Critical Evaluation of Robustness for Deep Learning Backdoor Countermeasures

A Study of Backdoor Attacks Against the Object Detection Model YOLOv5

Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

Detection of Backdoors in Trained Classifiers Without Access to the Training Set

Backdoor Attacks with Wavelet Embedding: Revealing and enhancing the insights of vulnerabilities in visual object detection models on transformers within digital twin systems

DetectS Ec: Evaluating the Robustness of Object Detection Models to Adversarial Attacks