NeuralSanitizer: Detecting Backdoors in Neural Networks
Hong Zhu,Yue Zhao,Shengzhi Zhang,Kai Chen
DOI: https://doi.org/10.1109/tifs.2024.3390599
IF: 7.231
2024-05-10
IEEE Transactions on Information Forensics and Security
Abstract:Deep neural networks (DNNs) have been pervasively used in many areas, e.g., computer vision, speech recognition, natural language processing, etc. However, recent works show that they are vulnerable to backdoor/Trojan attacks, severely restricting their usage in various scenarios. In this paper, we propose NeuralSanitizer, a novel approach to detect and remove backdoors in DNNs, capable of capturing various triggers with better accuracy and higher efficiency. In particular, we identify two fundamental properties of triggers, i.e., their effectiveness in the backdoored model and ineffectiveness in other clean models, and design a novel objective function to reconstruct triggers based on them. Then we present a new approach that leverages transferability to identify adversarial patches that could be generated during trigger reconstruction, thus detecting backdoors more accurately. We evaluate NeuralSanitizer on real-world backdoored DNNs and achieve 2.1% FNR and 0.9% FPR on average, significantly outperforming the state-of-the-art works by 1~14 times. In addition, NeuralSanitizer can reconstruct triggers up to 25% of the size of the original inputs on average, compared to only 6~10% by existing works. Finally, NeuralSanitizer is also 1~25 times faster than existing works.
computer science, theory & methods,engineering, electrical & electronic