MIC: an Effective Defense Against Word-Level Textual Backdoor Attacks

Shufan Yang,Qianmu Li,Zhichao Lian,Pengchuan Wang,Jun Hou
DOI: https://doi.org/10.1007/978-981-99-8076-5_1
2024-01-01
Abstract:Backdoor attacks, which manipulate model output, have garnered significant attention from researchers. However, some existing word-level backdoor attack methods in NLP models are difficult to defend effectively due to their concealment and diversity. These covert attacks use two words that appear similar to the naked eye but will be mapped to different word vectors by the NLP model as a way of bypassing existing defenses. To address this issue, we propose incorporating triple metric learning into the standard training phase of NLP models to defend against existing word-level backdoor attacks. Specifically, metric learning is used to minimize the distance between vectors of similar words while maximizing the distance between them and vectors of other words. Additionally, given that metric learning may reduce a model’s sensitivity to semantic changes caused by subtle perturbations, we added contrastive learning after the model’s standard training. Experimental results demonstrate that our method performs well against the two most stealthy existing word-level backdoor attacks.
What problem does this paper attempt to address?