Abstract:Machine learning (ML) models that use deep neural networks are vulnerable to backdoor attacks. Such attacks involve the insertion of a (hidden) trigger by an adversary. As a consequence, any input that contains the trigger will cause the neural network to misclassify the input to a (single) target class, while classifying other inputs without a trigger correctly. ML models that contain a backdoor are called Trojan models. Backdoors can have severe consequences in safety-critical cyber and cyber physical systems when only the outputs of the model are available. Defense mechanisms have been developed and illustrated to be able to distinguish between outputs from a Trojan model and a non-Trojan model in the case of a single-target backdoor attack with accuracy > 96 percent. Understanding the limitations of a defense mechanism requires the construction of examples where the mechanism fails. Current single-target backdoor attacks require one trigger per target class. We introduce a new, more general attack that will enable a single trigger to result in misclassification to more than one target class. Such a misclassification will depend on the true (actual) class that the input belongs to. We term this category of attacks multi-target backdoor attacks. We demonstrate that a Trojan model with either a single-target or multi-target trigger can be trained so that the accuracy of a defense mechanism that seeks to distinguish between outputs coming from a Trojan and a non-Trojan model will be reduced. Our approach uses the non-Trojan model as a teacher for the Trojan model and solves a min-max optimization problem between the Trojan model and defense mechanism. Empirical evaluations demonstrate that our training procedure reduces the accuracy of a state-of-the-art defense mechanism from >96 to 0 percent.

Scalable Backdoor Detection in Neural Networks

CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing

An Adaptive Black-box Backdoor Detection Method for Deep Neural Networks

NTD: Non-Transferability Enabled Deep Learning Backdoor Detection

Rethinking the Reverse-engineering of Trojan Triggers

Trojan Horse Training for Breaking Defenses against Backdoor Attacks in Deep Learning

TEN-GUARD: Tensor Decomposition for Backdoor Attack Detection in Deep Neural Networks

SGBA: A Stealthy Scapegoat Backdoor Attack Against Deep Neural Networks

Universal backdoor attack on deep neural networks for malware detection

An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks

Sparse Backdoor Attack Against Neural Networks.

Stealthy and Flexible Trojan in Deep Learning Framework

MDTD: A Multi Domain Trojan Detector for Deep Neural Networks

Backdoor Mitigation by Correcting the Distribution of Neural Activations

A Practical Trigger-Free Backdoor Attack on Neural Networks

Detecting Trojaned DNNs Using Counterfactual Attributions

A Novel Backdoor Attack Adapted to Transfer Learning.

Invisible Backdoor Attacks on Deep Neural Networks via Steganography and Regularization

Defense Against Multi-target Trojan Attacks

PatchBackdoor: Backdoor Attack against Deep Neural Networks without Model Modification

Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks