Abstract:One key challenge in backdoor attacks against large foundation models is the resource limits. Backdoor attacks usually require retraining the target model, which is impractical for very large foundation models. Existing backdoor attacks are mainly designed for supervised classifiers or small foundation models (e.g., BERT). None of these attacks has successfully compromised a very large foundation model, such as Llama-3-70B, especially with limited computational resources. In this paper, we propose TrojFM, a novel backdoor attack tailored for very large foundation models. Our primary technical contribution is the development of a novel backdoor injection method. This method forces a backdoored model to generate similar hidden representations for poisoned inputs regardless of their actual semantics. Our approach injects such backdoors by fine-tuning only a very small proportion of model parameters. This enables TrojFM to efficiently launch downstream task-agnostic backdoor attacks against very large foundation models under limited computational resources. Moreover, we optimize the fine-tuning process with our customized QLoRA technique, enabling launching our attack via only~\textit{one A100 GPU}. Furthermore, we design a new trigger injection method to ensure our attack stealthiness. Through extensive experiments, we first demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models without jeopardizing their normal functionalities (and outperforming existing attacks on BERT-style models). Furthermore, we show that TrojFM is resilient to SOTA defenses and is insensitive to changes in key hyper-parameters. Finally, we conduct a resource analysis to quantify that our method can significantly save computational and memory costs compared to existing backdoor attacks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the technical challenge of conducting backdoor attacks on very large foundation models (such as Llama - 3 - 70B) under extremely limited resources. Specifically, existing backdoor attack methods usually require retraining the target model, which is impractical in terms of computational resources for very large foundation models. In addition, existing backdoor attacks are mainly aimed at supervised classifiers or smaller foundation models (such as BERT), and have not successfully attacked very large foundation models, especially under limited computational resources. To address these challenges, the paper proposes **TrojFM**, a novel and efficient backdoor attack method designed for very large foundation models. The main technical contribution of **TrojFM** lies in the development of a new backdoor injection method. This method is achieved by fine - tuning a small part of the model's parameters, enabling the backdoor - injected model to generate similar hidden representations regardless of the actual semantics of the input. This method not only enables **TrojFM** to efficiently launch downstream - task - independent backdoor attacks on very large foundation models under limited computational resources, but also optimizes the fine - tuning process through the custom - made **QLoRA** technology, allowing it to complete the attack using only one A100 GPU. Furthermore, the paper also designs a new trigger injection method to ensure the stealth of the attack, and verifies the effectiveness of **TrojFM** through extensive experiments, demonstrating its ability to effectively conduct backdoor attacks on widely - used large GPT - style models without affecting the normal functions of the models. Meanwhile, **TrojFM** is resistant to existing state - of - the - art defense measures and is not sensitive to changes in key hyperparameters. Finally, the paper also conducts a resource analysis, quantifying the significant savings in computational and memory costs of this method compared to existing backdoor attack methods. In conclusion, this paper solves the key problem of implementing efficient, task - independent backdoor attacks on very large foundation models under extreme resource limitations.

TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models

B3: Backdoor Attacks Against Black-box Machine Learning Models

BAD-FM: Backdoor Attacks Against Factorization-Machine Based Neural Network for Tabular Data Prediction

TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models

Stealthy Backdoor Attack for Code Models

Act in Collusion: A Persistent Distributed Multi-Target Backdoor in Federated Learning

Data Stealing Attacks against Large Language Models via Backdooring

Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor

Composite Backdoor Attacks Against Large Language Models

Backdoor Threats from Compromised Foundation Models to Federated Learning

Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing

LoBAM: LoRA-Based Backdoor Attack on Model Merging

BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models

Watch Out for Your Guidance on Generation! Exploring Conditional Backdoor Attacks against Large Language Models

Weak-to-Strong Backdoor Attack for Large Language Models

CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing

An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers

Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models

Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning

Neutralizing Backdoors through Information Conflicts for Large Language Models