CBAs: Character-level Backdoor Attacks Against Chinese Pre-trained Language Models

Xinyu He,Fengrui Hao,Tianlong Gu,Liang Chang
DOI: https://doi.org/10.1145/3678007
IF: 2.717
2024-01-01
ACM Transactions on Privacy and Security
Abstract:The pre-trained language models (PLMs) aim to assist computers in various domains to provide natural and efficient language interaction and text processing capabilities. However, recent studies have shown that PLMs are highly vulnerable to malicious backdoor attacks, where triggers could be injected into the models to guide them to exhibit the expected behavior of the attackers. Unfortunately, existing researches on backdoor attacks have mainly focused on English PLMs, but paid less attention to the Chinese PLMs. Moreover, these extant backdoor attacks don’t work well against Chinese PLMs. In this paper, we disclose the limitations of English backdoor attacks against Chinese PLMs, and propose the character-level backdoor attacks (CBAs) against the Chinese PLMs. Specifically, we first design three Chinese trigger generation strategies to ensure the backdoor being effectively triggered while improving the effectiveness of the backdoor attacks. Then, based on the attacker’s capabilities of accessing the training dataset, we develop trigger injection mechanisms with either the target label similarity or the masked language model, which select the most influential position and insert the trigger to maximize the stealth of backdoor attacks. Extensive experiments on three major natural language processing tasks in various Chinese PLMs and English PLMs demonstrate the effectiveness and stealthiness of our method. Besides, CBAs also have very strong resistance against three state-of-the-art backdoor defense methods.
What problem does this paper attempt to address?