Abstract:Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles we discover subtle -- yet critical -- structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from input to output layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to detect Trojan attacks in neural networks. A Trojan attack is a type of data - poisoning attack. Attackers manipulate the behavior of the model by injecting Trojan samples into the training dataset, that is, using specific triggers to overwrite normal training samples and assigning these samples specific target - class labels. Once the model is trained, it behaves normally when processing normal data, but when it sees the trigger, it will abnormally misclassify the data as the target class. This attack poses a serious threat to the security of deep neural networks. The main contribution of the paper lies in proposing a method based on topological data analysis to detect Trojan attacks. The authors observe that there are significant structural differences between Trojan models and clean models, especially the appearance of short - cuts from the input layer to the output layer in Trojan models. Using this observation, the authors develop a new strategy to robustly detect Trojan models. Specifically, they use tools from topological data analysis, especially Persistent Homology, to capture and compare high - order structural information between different models. This method can not only identify whether a model has been Trojan - attacked, but also locate structural anomalies in the model, thereby revealing the impact mechanism of Trojan attacks. Through theoretical analysis and experimental verification, this method shows better performance than existing methods in multiple benchmark tests, especially in the case of limited data, it can effectively detect Trojan models. This provides a new perspective and tool for improving the security of deep neural networks.

Topological Detection of Trojaned Neural Networks

Cassandra: Detecting Trojaned Networks from Adversarial Perturbations

An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks

Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases

Detecting Trojaned DNNs Using Counterfactual Attributions

Dormant Neural Trojans

PerD: Perturbation Sensitivity-based Neural Trojan Detection Framework on NLP Applications

DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks

CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing

PoTrojan: Powerful Neural-Level Trojan Designs in Deep Learning Models.

Trojan Cleansing with Neural Collapse

Trigger Hunting with a Topological Prior for Trojan Detection

Odyssey: Creation, Analysis and Detection of Trojan Models

An Adaptive Black-box Backdoor Detection Method for Deep Neural Networks

Multi-Target Invisibly Trojaned Networks for Visual Recognition and Detection

POSTER: Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors

Scalable Backdoor Detection in Neural Networks

A Survey of Trojan Attacks and Defenses to Deep Neural Networks

Programmable Neural Network Trojan for Pre-Trained Feature Extractor.

Trojan Signatures in DNN Weights

Detecting AI Trojans Using Meta Neural Analysis