Abstract:Deep learning models achieve excellent performance in numerous machine learning tasks. Yet, they suffer from security-related issues such as adversarial examples and poisoning (backdoor) attacks. A deep learning model may be poisoned by training with backdoored data or by modifying inner network parameters. Then, a backdoored model performs as expected when receiving a clean input, but it misclassifies when receiving a backdoored input stamped with a pre-designed pattern called "trigger". Unfortunately, it is difficult to distinguish between clean and backdoored models without prior knowledge of the trigger. This paper proposes a backdoor detection method by utilizing a special type of adversarial attack, universal adversarial perturbation (UAP), and its similarities with a backdoor trigger. We observe an intuitive phenomenon: UAPs generated from backdoored models need fewer perturbations to mislead the model than UAPs from clean models. UAPs of backdoored models tend to exploit the shortcut from all classes to the target class, built by the backdoor trigger. We propose a novel method called Universal Soldier for Backdoor detection (USB) and reverse engineering potential backdoor triggers via UAPs. Experiments on 345 models trained on several datasets show that USB effectively detects the injected backdoor and provides comparable or better results than state-of-the-art methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to detect backdoor attacks in machine - learning models. Specifically, the authors propose a new detection method - **Universal Soldier for Backdoor detection (USB)**, which utilizes the similarity between universal adversarial perturbations (UAP) and backdoor triggers to detect backdoor attacks. ### Background and Motivation 1. **Limitations of Existing Methods**: - **Reverse - engineering methods** (such as Neural Cleanse (NC) and TABOR) may not be able to effectively detect backdoors in some cases, especially when the backdoor trigger is very similar to the class features. - **Non - patch - based attacks** (such as Input - aware dynamic backdoor attacks) pose challenges to existing methods because these methods usually start optimization from random points and it is difficult to generate specifically designed triggers. 2. **Similarity between UAP and Backdoor Triggers**: - The authors observe that the UAP generated from the backdoor - injected model requires less perturbation to mislead the model, indicating that UAP can capture the characteristics of backdoor neurons. - Using this property, the USB method can directly use UAP to detect potential backdoors and is more efficient than existing methods. ### Method Overview 1. **Threat Model**: - The paper considers defense in the machine - learning - as - a - service (MLaaS) scenario, where the attacker controls or accesses the MLaaS platform with the aim of injecting backdoors into the model. 2. **Target UAP Generation**: - The authors modify the existing UAP generation algorithm so that it can generate UAP for specific target classes. The specific algorithm is shown in Algorithm 1. By iteratively updating the perturbation vector \(v\), the input data \(x_i + v\) is misclassified as the target class \(t\). 3. **UAP Optimization**: - To further analyze potential triggers, the authors introduce an optimization phase to update the target UAP by minimizing the loss function. The loss function includes cross - entropy loss, structural similarity index (SSIM) and L1 - norm, as shown specifically in formula (1): \[ L = L_{ce}(\text{output}, t)-\text{SSIM}(x, x')+\text{norm}_{L1}(\text{mask}) \] ### Experimental Results 1. **Datasets and Models**: - The authors conduct experiments on multiple datasets (CIFAR - 10, ImageNet, GTSRB, MNIST) and model architectures (ResNet - 18, VGG - 16, EfficientNet - B0). 2. **Performance Evaluation**: - **Model Detection**: USB performs excellently in detecting backdoor - injected models, especially when dealing with strong backdoor attacks (such as Latent Backdoor and Input - aware dynamic backdoor attacks). - **Target Class Detection**: USB can accurately identify the target class of the backdoor, while other methods (such as NC and TABOR) may fail in some cases. 3. **Time Consumption**: - Compared with NC and TABOR, USB significantly reduces the time required for optimizing reverse - engineering potential triggers because UAP can be generalized across different networks and only needs to be generated once for similar models. ### Conclusion This paper proposes a new backdoor detection method USB. By exploiting the similarity between UAP and backdoor triggers, it effectively detects multiple types of backdoor attacks. The experimental results show that USB performs excellently on multiple datasets and model architectures, especially when dealing with strong backdoor attacks. Future work will further optimize the UAP generation process to reduce the optimization time.

Universal Soldier: Using Universal Adversarial Perturbations for Detecting Backdoor Attacks

Detection of backdoor attacks using targeted universal adversarial perturbations for deep neural networks

AdvDoor: Adversarial Backdoor Attack of Deep Learning System

Detection of Backdoors in Trained Classifiers Without Access to the Training Set

Untargeted Backdoor Attack Against Object Detection

Universal Backdoor Attacks Detection via Adaptive Adversarial Probe

Universal backdoor attack on deep neural networks for malware detection

BadDet: Backdoor Attacks on Object Detection

Clean-Label Backdoor Attacks on Video Recognition Models

Escaping Backdoor Attack Detection of Deep Learning

Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks

Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

A Defense Method Against Backdoor Attacks in Neural Networks Using an Image Repair Technique

Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System

Universal Backdoor Attacks

Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

PatchBackdoor: Backdoor Attack against Deep Neural Networks without Model Modification