Abstract:Fine-tuning pre-trained models for downstream tasks has led to a proliferation of open-sourced task-specific models. Recently, Model Merging (MM) has emerged as an effective approach to facilitate knowledge transfer among these independently fine-tuned models. MM directly combines multiple fine-tuned task-specific models into a merged model without additional training, and the resulting model shows enhanced capabilities in multiple tasks. Although MM provides great utility, it may come with security risks because an adversary can exploit MM to affect multiple downstream tasks. However, the security risks of MM have barely been studied. In this paper, we first find that MM, as a new learning paradigm, introduces unique challenges for existing backdoor attacks due to the merging process. To address these challenges, we introduce BadMerging, the first backdoor attack specifically designed for MM. Notably, BadMerging allows an adversary to compromise the entire merged model by contributing as few as one backdoored task-specific model. BadMerging comprises a two-stage attack mechanism and a novel feature-interpolation-based loss to enhance the robustness of embedded backdoors against the changes of different merging parameters. Considering that a merged model may incorporate tasks from different domains, BadMerging can jointly compromise the tasks provided by the adversary (on-task attack) and other contributors (off-task attack) and solve the corresponding unique challenges with novel attack designs. Extensive experiments show that BadMerging achieves remarkable attacks against various MM algorithms. Our ablation study demonstrates that the proposed attack designs can progressively contribute to the attack performance. Finally, we show that prior defense mechanisms fail to defend against our attacks, highlighting the need for more advanced defense.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to explore and solve the security vulnerabilities introduced during the model merging (MM) process, especially the backdoor attacks against MM. Specifically: 1. **Identifying new attack surfaces**: The paper points out that although model merging can effectively improve the performance of multi - task models, this process may bring new security risks. In particular, an attacker can influence the entire merged model by providing a task - specific model with a backdoor. 2. **Designing a specialized backdoor attack**: To address the limitations of existing backdoor attacks in the model - merging scenario, the paper proposes a new attack framework named BadMerging. This framework can successfully attack the entire merged model by contributing a task - specific model with a backdoor. 3. **Coping with the influence of different merging parameters**: Existing backdoor attack methods perform poorly in the face of different merging parameters because each model is rescaled during the merging process, causing the backdoor to disappear. BadMerging enhances the robustness of the embedded backdoor by introducing a new mechanism based on feature - interpolation - based loss, making it resistant to changes in merging parameters. 4. **Handling cross - task attacks**: Since the merged model can integrate tasks from different domains, an attacker may not be able to know the specific content of all tasks in advance. Therefore, BadMerging also introduces the concepts of on - task attacks and off - task attacks to ensure that the attacker can still carry out effective backdoor attacks in the case of unknown tasks. ### Specific problem description - **Background**: With the wide application of pre - trained models, fine - tuning these models to adapt to downstream tasks has become a common practice. However, the fine - tuned models are usually only focused on specific tasks, which limits their universality and efficiency. For this reason, the model - merging technique has emerged, which can directly combine multiple task - specific models without additional training, thereby improving multi - task performance. - **Problem**: Although model merging provides great convenience, it also brings potential security risks. In particular, an attacker can inject a backdoor into the final merged model by releasing a task - specific model with a backdoor and using the model - merging process, thereby affecting multiple downstream tasks. ### Core contributions of the paper 1. **Discovering new attack surfaces**: It is the first to reveal the backdoor attack challenges unique to the model - merging process and proposes BadMerging as a solution. 2. **Designing a two - stage attack mechanism**: BadMerging adopts a two - stage attack mechanism and introduces a loss function based on feature interpolation to enhance the robustness of the backdoor. 3. **Handling cross - task attacks**: It proposes two new techniques, shadow classes and adversarial data augmentation, to improve the effectiveness of off - task attacks. 4. **Experimental verification**: Through a large number of experiments, the effectiveness and practicality of BadMerging are verified, and the ineffectiveness of existing defense mechanisms against such attacks is demonstrated, emphasizing the need for more advanced defense measures. In summary, this paper not only reveals new security threats in the model - merging process but also proposes a brand - new attack framework, providing an important reference for future AI security research.

BadMerging: Backdoor Attacks Against Model Merging

B3: Backdoor Attacks Against Black-box Machine Learning Models

Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace

MARNet: Backdoor Attacks Against Cooperative Multi-Agent Reinforcement Learning

LoBAM: LoRA-Based Backdoor Attack on Model Merging

Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge

BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models

CAMH: Advancing Model Hijacking Attack in Machine Learning

Act in Collusion: A Persistent Distributed Multi-Target Backdoor in Federated Learning

Mithridates: Auditing and Boosting Backdoor Resistance of Machine Learning Pipelines

Exploring the Vulnerability of Self-supervised Monocular Depth Estimation Models

Multi-Task Models Adversarial Attacks

Behavior Backdoor for Deep Learning Models

Multi-target Backdoor Attacks for Code Pre-trained Models

Fusing Pruned and Backdoored Models: Optimal Transport-based Data-free Backdoor Mitigation

Model-agnostic clean-label backdoor mitigation in cybersecurity environments

Model Supply Chain Poisoning: Backdooring Pre-trained Models via Embedding Indistinguishability

Neutralizing Backdoors through Information Conflicts for Large Language Models

Enhanced Coalescence Backdoor Attack Against DNN Based on Pixel Gradient

Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges

Moderate-fitting as a Natural Backdoor Defender for Pre-trained Language Models