Abstract:—Deep neural networks have achieved remarkable success on a variety of mission-critical tasks. However, recent studies show that deep neural networks are vulnerable to backdoor attacks, where the attacker releases backdoored models that behave normally on benign samples but misclassify any trigger-imposed samples to a target label. Unlike adversarial examples, backdoor attacks manipulate both the inputs and the model, perturbing samples with the trigger and injecting backdoors into the model. In this paper, we propose a novel attention-based evasive backdoor attack, dubbed A TTEQ -NN . Different from existing works that arbitrarily set the trigger mask, we carefully design an attention- based trigger mask determination framework, which places the trigger at the crucial region with the most significant influence on the prediction results. To make the trigger-imposed samples appear more natural and imperceptible to human inspectors, we introduce a Quality-of-Experience (QoE) term into the loss function of trigger generation and carefully adjust the transparency of the trigger. During the process of iteratively optimizing the trigger generation and the backdoor injection components, we propose an alternating retraining strategy, which is shown to be effective in improving the clean data accuracy and evading some model-based defense approaches. We evaluate A TTEQ -NN with extensive experiments on VGG- Flower, CIFAR-10, GTSRB, CIFAR-100, and ImageNette datasets. The results show that A TTEQ -NN can increase the attack success rate by as much as 82% over baselines when the poison ratio is low while achieving a high QoE of the backdoored samples. We demonstrate that A TTEQ -NN reaches an attack success rate of more than 37.78% in the physical world under different lighting conditions and shooting angles. A TTEQ -NN preserves an attack success rate of more than 92.5% even if the original backdoored model is fine-tuned with clean data. It is shown that A TTEQ -NN is also effective in transfer learning scenarios. Our user studies show that the backdoored samples generated by A TTEQ -NN are indiscernible under visual inspections. A TTEQ -NN is shown to be evasive to state-of-the-art defense methods, including model pruning, NAD, STRIP, NC, and MNTD. We will open-source our codes upon publication.

Redeem Myself: Purifying Backdoors in Deep Learning Models Using Self Attention Distillation.

ARTEMIS: Defending Against Backdoor Attacks Via Distribution Shift

ATTEQ-NN: Attention-based QoE-aware Evasive Backdoor Attacks.

From Toxic to Trustworthy: Using Self-Distillation and Semi-supervised Methods to Refine Neural Networks

Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks

Self-supervised learning backdoor defense mixed with self-attention mechanism

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Backdoor Cleansing with Unlabeled Data

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

Reverse Backdoor Distillation: Towards Online Backdoor Attack Detection for Deep Neural Network Models

PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning

Need for Speed: Taming Backdoor Attacks with Speed and Precision

Backdoor Defense via Decoupling the Training Process

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack

FLSAD: Defending Backdoor Attacks in Federated Learning via Self-Attention Distillation

Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models

Defense against Backdoor Attack on Pre-trained Language Models via Head Pruning and Attention Normalization

Augmented Neural Fine-Tuning for Efficient Backdoor Purification

Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder