TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models

Yuzhou. Nie,Yanting. Wang,Jinyuan. Jia,Michael J. De Lucia,Nathaniel D. Bastian,Wenbo. Guo,Dawn. Song
DOI: https://doi.org/10.48550/arXiv.2405.16783
2024-05-27
Abstract:One key challenge in backdoor attacks against large foundation models is the resource limits. Backdoor attacks usually require retraining the target model, which is impractical for very large foundation models. Existing backdoor attacks are mainly designed for supervised classifiers or small foundation models (e.g., BERT). None of these attacks has successfully compromised a very large foundation model, such as Llama-3-70B, especially with limited computational resources. In this paper, we propose TrojFM, a novel backdoor attack tailored for very large foundation models. Our primary technical contribution is the development of a novel backdoor injection method. This method forces a backdoored model to generate similar hidden representations for poisoned inputs regardless of their actual semantics. Our approach injects such backdoors by fine-tuning only a very small proportion of model parameters. This enables TrojFM to efficiently launch downstream task-agnostic backdoor attacks against very large foundation models under limited computational resources. Moreover, we optimize the fine-tuning process with our customized QLoRA technique, enabling launching our attack via only~\textit{one A100 GPU}. Furthermore, we design a new trigger injection method to ensure our attack stealthiness. Through extensive experiments, we first demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models without jeopardizing their normal functionalities (and outperforming existing attacks on BERT-style models). Furthermore, we show that TrojFM is resilient to SOTA defenses and is insensitive to changes in key hyper-parameters. Finally, we conduct a resource analysis to quantify that our method can significantly save computational and memory costs compared to existing backdoor attacks.
Cryptography and Security,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the technical challenge of conducting backdoor attacks on very large foundation models (such as Llama - 3 - 70B) under extremely limited resources. Specifically, existing backdoor attack methods usually require retraining the target model, which is impractical in terms of computational resources for very large foundation models. In addition, existing backdoor attacks are mainly aimed at supervised classifiers or smaller foundation models (such as BERT), and have not successfully attacked very large foundation models, especially under limited computational resources. To address these challenges, the paper proposes **TrojFM**, a novel and efficient backdoor attack method designed for very large foundation models. The main technical contribution of **TrojFM** lies in the development of a new backdoor injection method. This method is achieved by fine - tuning a small part of the model's parameters, enabling the backdoor - injected model to generate similar hidden representations regardless of the actual semantics of the input. This method not only enables **TrojFM** to efficiently launch downstream - task - independent backdoor attacks on very large foundation models under limited computational resources, but also optimizes the fine - tuning process through the custom - made **QLoRA** technology, allowing it to complete the attack using only one A100 GPU. Furthermore, the paper also designs a new trigger injection method to ensure the stealth of the attack, and verifies the effectiveness of **TrojFM** through extensive experiments, demonstrating its ability to effectively conduct backdoor attacks on widely - used large GPT - style models without affecting the normal functions of the models. Meanwhile, **TrojFM** is resistant to existing state - of - the - art defense measures and is not sensitive to changes in key hyperparameters. Finally, the paper also conducts a resource analysis, quantifying the significant savings in computational and memory costs of this method compared to existing backdoor attack methods. In conclusion, this paper solves the key problem of implementing efficient, task - independent backdoor attacks on very large foundation models under extreme resource limitations.