Abstract:Backdoor attacks on deep learning represent a recent threat that has gained significant attention in the research community. Backdoor defenses are mainly based on backdoor inversion, which has been shown to be generic, model-agnostic, and applicable to practical threat scenarios. State-of-the-art backdoor inversion recovers a mask in the feature space to locate prominent backdoor features, where benign and backdoor features can be disentangled. However, it suffers from high computational overhead, and we also find that it overly relies on prominent backdoor features that are highly distinguishable from benign features. To tackle these shortcomings, this paper improves backdoor feature inversion for backdoor detection by incorporating extra neuron activation information. In particular, we adversarially increase the loss of backdoored models with respect to weights to activate the backdoor effect, based on which we can easily differentiate backdoored and clean models. Experimental results demonstrate our defense, BAN, is 1.37$\times$ (on CIFAR-10) and 5.11$\times$ (on ImageNet200) more efficient with 9.99% higher detect success rate than the state-of-the-art defense BTI-DBF. Our code and trained models are publicly available.\url{https://anonymous.4open.science/r/ban-4B32}

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to detect backdoor attacks in deep learning. Specifically, the existing backdoor defense methods are mainly based on backdoor inversion techniques. These methods recover a mask in the feature space to locate significant backdoor features, thereby separating benign features from backdoor features. However, these methods have high computational overhead and rely too much on significant backdoor features that are highly distinguishable from benign features. To overcome these drawbacks, this paper proposes a new backdoor detection method - BAN (Backdoor Activation by Adversarial Neuron Noise), which improves backdoor feature inversion by introducing additional neuron activation information to detect backdoor attacks more efficiently. The main contributions of BAN include: 1. **Discovering the insufficient generalization of existing trigger - inversion detection methods**: In particular, feature - space detection relies too much on highly distinguishable significant backdoor features. The author conducts an in - depth analysis of trigger - inversion - based backdoor defenses and points out that the significant backdoor features exploited by the current state - of - the - art defense methods may not be applicable to the identification of backdoor in the input space. 2. **Proposing the BAN method**: By introducing neuron noise in feature - space trigger inversion, BAN includes an adversarial learning process that incorporates neuron activation information into inversion - based backdoor detection. Experimental results show that on CIFAR - 10, BAN is 1.37 times more efficient than the state - of - the - art defense method BTI - DBF, and on ImageNet200, it is 5.11 times more efficient and the detection success rate is increased by 9.99%. 3. **Designing a simple backdoor removal method**: Using neuron noise, a simple and effective defense method is further designed to remove the backdoor, thereby constructing a complete defense framework. Through these contributions, BAN aims to provide a more efficient and more general backdoor detection and defense scheme, especially when dealing with different types of backdoor attacks.

BAN: Detecting Backdoors Activated by Adversarial Neuron Noise

B3: Backdoor Attacks Against Black-box Machine Learning Models

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack

Black-box Detection of Backdoor Attacks with Limited Information and Data

Adversarial Neuron Pruning Purifies Backdoored Deep Models

Sparse Backdoor Attack Against Neural Networks.

Beating Backdoor Attack at Its Own Game

Invisible Backdoor Attacks on Deep Neural Networks via Steganography and Regularization

Need for Speed: Taming Backdoor Attacks with Speed and Precision

Imperceptible and Multi-channel Backdoor Attack against Deep Neural Networks

Backdoor Mitigation by Correcting the Distribution of Neural Activations

BeniFul: Backdoor Defense via Middle Feature Analysis for Deep Neural Networks

CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing

Backdoor Defense Via Deconfounded Representation Learning.

Backdoor Defense via Decoupling the Training Process

Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

Imperceptible Backdoor Attack: from Input Space to Feature Representation

Stealthy Backdoor Attack with Adversarial Training

An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers

Enhanced Coalescence Backdoor Attack Against DNN Based on Pixel Gradient