A Defense Method Against Backdoor Attacks in Neural Networks Using an Image Repair Technique

Jiangtao Chen,Huijuan Lu,Wanli Huo,Shicong Zhang,Yuefeng Chen,Yudong Yao
DOI: https://doi.org/10.1109/itme56794.2022.00087
2022-01-01
Abstract:With the rapid development of deep learning research and applications, the problem of artificial intelligence security has become increasingly prominent, such as adversarial examples, universal adversarial patch, and data poisoning, especially for the backdoor attack, which is a new type of covert attack, leading to the vulnerability and non-robustness of deep learning models. In a backdoor attack, the attacker will conduct a malicious attack by inserting some poisoned samples into training dataset. Poisoned samples add triggers and modify the labels to the target labels to participate in the training. Infected model has the same accuracy as the clean model in the normal test set, but when confronted with poisoned samples, the triggers will be activated to make the infected model predict the target label. To solve this problem, model parameters adjustment and poisoned data removal methods are widely used. However, they lack real-time performance and accuracy is insufficient. In this paper, we propose a new backdoor attack defense method, in which trigger reverse engineering is used to obtain the right triggers and image repair techniques to make sure that the input model data can be real-time processed without any negative impacts on clean samples.
What problem does this paper attempt to address?