Topological Detection of Trojaned Neural Networks

Songzhu Zheng,Yikai Zhang,Hubert Wagner,Mayank Goswami,Chao Chen
DOI: https://doi.org/10.48550/arXiv.2106.06469
2021-06-11
Abstract:Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles we discover subtle -- yet critical -- structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from input to output layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks.
Machine Learning,Computational Geometry
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to detect Trojan attacks in neural networks. A Trojan attack is a type of data - poisoning attack. Attackers manipulate the behavior of the model by injecting Trojan samples into the training dataset, that is, using specific triggers to overwrite normal training samples and assigning these samples specific target - class labels. Once the model is trained, it behaves normally when processing normal data, but when it sees the trigger, it will abnormally misclassify the data as the target class. This attack poses a serious threat to the security of deep neural networks. The main contribution of the paper lies in proposing a method based on topological data analysis to detect Trojan attacks. The authors observe that there are significant structural differences between Trojan models and clean models, especially the appearance of short - cuts from the input layer to the output layer in Trojan models. Using this observation, the authors develop a new strategy to robustly detect Trojan models. Specifically, they use tools from topological data analysis, especially Persistent Homology, to capture and compare high - order structural information between different models. This method can not only identify whether a model has been Trojan - attacked, but also locate structural anomalies in the model, thereby revealing the impact mechanism of Trojan attacks. Through theoretical analysis and experimental verification, this method shows better performance than existing methods in multiple benchmark tests, especially in the case of limited data, it can effectively detect Trojan models. This provides a new perspective and tool for improving the security of deep neural networks.