UTPrompt: Cross-Task Backdoor Prompt Attacks Based on Universal Triggers

Yu Jiang,Pengchuan Wang,Qianmu Li,Nan Liu
DOI: https://doi.org/10.1007/978-981-97-5603-2_35
2024-01-01
Abstract:The prompt-based learning paradigm bridges the gap between pretraining and fine-tuning, achieving state-of-the-art performance on multiple Natural Language Processing tasks, particularly in low-resource scenarios. However, the security issues associated with prompt-based models are becoming increasingly serious, especially related to backdoor attacks. Existing research mainly focuses on attacking specific tasks, lacking overall consideration for practicality (black-box scenarios) and generality (cross-task attacks). To address this challenge, we propose a Cross-Task Backdoor Prompt Attacks Based on Universal Triggers, namely UTPrompt. Specifically, UTPrompt first generates universal triggers by querying the model APIs based on a small number of samples, and then builds gradient-free poisoning samples in the prompt-based tuning stage to inject the backdoor into the prompt model. Extensive experiments on six real-world datasets demonstrate that UTPrompt outperforms several state-of-the-art baselines, and it can achieve nearly 100% attack success rate across-tasks with almost no sacrifice in accuracy of the original task.
What problem does this paper attempt to address?