Abstract:Extensive evidence has demonstrated that deep neural networks (DNNs) are vulnerable to backdoor attacks, which motivates the development of backdoor attacks detection. Most detection methods are designed to verify whether a model is infected with presumed types of backdoor attacks, yet the adversary is likely to generate diverse backdoor attacks in practice that are unforeseen to defenders, which challenge current detection strategies. In this paper, we focus on this more challenging scenario and propose a universal backdoor attacks detection method named Adaptive Adversarial Probe (A2P). Specifically, we posit that the challenge of universal backdoor attacks detection lies in the fact that different backdoor attacks often exhibit diverse characteristics in trigger patterns (i.e., sizes and transparencies). Therefore, our A2P adopts a global-to-local probing framework, which adversarially probes images with adaptive regions/budgets to fit various backdoor triggers of different sizes/transparencies. Regarding the probing region, we propose the attention-guided region generation strategy that generates region proposals with different sizes/locations based on the attention of the target model, since trigger regions often manifest higher model activation. Considering the attack budget, we introduce the box-to-sparsity scheduling that iteratively increases the perturbation budget from box to sparse constraint, so that we could better activate different latent backdoors with different transparencies. Extensive experiments on multiple datasets (CIFAR-10, GTSRB, Tiny-ImageNet) demonstrate that our method outperforms state-of-the-art baselines by large margins (+12%).

Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

KerbNet: A QoE-aware Kernel-Based Backdoor Attack Framework

Universal backdoor attack on deep neural networks for malware detection

Comparative Evaluation of Recent Universal Adversarial Perturbations in Image Classification

Universal Backdoor Attacks Detection via Adaptive Adversarial Probe

Visual Analytics of Neuron Vulnerability to Adversarial Attacks on Convolutional Neural Networks

Invisible Backdoor Attacks on Deep Neural Networks via Steganography and Regularization

Imperceptible and Multi-channel Backdoor Attack against Deep Neural Networks

Real-time Detection of Practical Universal Adversarial Perturbations

Universal Soldier: Using Universal Adversarial Perturbations for Detecting Backdoor Attacks

Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks

An Adaptive Black-box Backdoor Detection Method for Deep Neural Networks

Scalable Backdoor Detection in Neural Networks

Non-Uniform Illumination Attack for Fooling Convolutional Neural Networks

Adversarial scratches: Deployable attacks to CNN classifiers

SATBA: An Invisible Backdoor Attack Based On Spatial Attention

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Detection of backdoor attacks using targeted universal adversarial perturbations for deep neural networks

CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing

Backdooring Convolutional Neural Networks via Targeted Weight Perturbations

Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks