A Trigger Sample Detection Scheme Based on Custom Backdoor Behaviors
WANG Shang,LI Xin,SONG Yongli,SU Mang,FU Anmin
DOI: https://doi.org/10.19363/J.cnki.cn10-1380/tn.2022.11.03
2022-01-01
Journal of Cyber Security
Abstract:Deep learning leverages powerful feature representation and learning capabilities to breathe new life into various fields such as finance and healthcare, but the training process is vulnerable to security threats, easily introducing mainstream backdoor attacks through manipulating the training data set or modifying model weights, including data poisoning attack and model poison attack. The backdoor implanted by both types of backdoor attacks is great stealthy, the backdoored model can maintain the clean data accuracy, while presenting targeted misclassification for samples embedded with the attacker-specific triggers. This paper proposes a custom backdoor behavior-based trigger samples detection scheme BackDetc, focusing on the essential difference on the fit degree between clean samples and trigger samples. It injects custom backdoors into the model through tiny defender-custom triggers, proposing an input sample perturbation mechanism by embedding these custom triggers. We measure the fit degree of inputs adopting the transparency of the custom trigger,and calculate the threshold of anomaly detection with the fit degree of clean samples as a reference, identifying these samples with attacker-specific triggers. In this way, BackDetc not only holds the affordable overhead for resource limited users,but reduces the strength of backdoor defense assumption, being deployed in various real-world applications and being effective for mainstream backdoor attacks as well as more threatening source-specific backdoor attacks. In experiments, the BackDetc is deployed on MNIST, CIFAR-10 classification tasks, outperforming other existing trigger samples detection schemes on detection success rate when facing data poisoning attack and model poison attack, with an average of over 99.8%. Then, the influence of the detection false positive rate is explored on the detection performance, giving the capability of dynamically adjusting the detection effect of BackDetc, displaying 100% detection success rate on all tasks when encountering two mainstream backdoor attacks. Meanwhile, in the CIFAR-10 task, a source-specific backdoor attack is implemented to evaluate various trigger samples detection schemes, only BackDetc successfully resists such the attack and increases the detection success rate to 96.2% by adjusting the false positive rate.