Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace

Jinluan Yang,Anke Tang,Didi Zhu,Zhengyu Chen,Li Shen,Fei Wu

2024-10-17

Abstract:Model merging has gained significant attention as a cost-effective approach to integrate multiple single-task fine-tuned models into a unified one that can perform well on multiple tasks. However, existing model merging techniques primarily focus on resolving conflicts between task-specific models, they often overlook potential security threats, particularly the risk of backdoor attacks in the open-source model ecosystem. In this paper, we first investigate the vulnerabilities of existing model merging methods to backdoor attacks, identifying two critical challenges: backdoor succession and backdoor transfer. To address these issues, we propose a novel Defense-Aware Merging (DAM) approach that simultaneously mitigates task interference and backdoor vulnerabilities. Specifically, DAM employs a meta-learning-based optimization method with dual masks to identify a shared and safety-aware subspace for model merging. These masks are alternately optimized: the Task-Shared mask identifies common beneficial parameters across tasks, aiming to preserve task-specific knowledge while reducing interference, while the Backdoor-Detection mask isolates potentially harmful parameters to neutralize security threats. This dual-mask design allows us to carefully balance the preservation of useful knowledge and the removal of potential vulnerabilities. Compared to existing merging methods, DAM achieves a more favorable balance between performance and security, reducing the attack success rate by 2-10 percentage points while sacrificing only about 1% in accuracy. Furthermore, DAM exhibits robust performance and broad applicability across various types of backdoor attacks and the number of compromised models involved in the merging process. We will release the codes and models soon.

Cryptography and Security,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the backdoor attack problem during the multi - task model merging process. Specifically, existing model merging techniques mainly focus on resolving conflicts between specific - task models, but often overlook potential security threats, especially the risk of backdoor attacks in the open - source model ecosystem. The two key challenges mentioned in the paper are: 1. **Backdoor Succession**: That is, one or more models with backdoors still have harmful elements in the merged model. 2. **Backdoor Transfer**: That is, the harmful elements in the models with backdoors will spread to clean models, affecting their security and performance. To address these challenges, the authors propose a new Defense - Aware Merging (DAM) algorithm, which simultaneously mitigates task interference and backdoor vulnerabilities by identifying shared and secure sub - spaces. DAM adopts a meta - learning optimization method and introduces two specialized masks: - **Task - Shared Mask**: Used to identify the common beneficial parameters among different tasks, aiming to preserve task - specific knowledge and reduce interference. - **Backdoor - Detection Mask**: Used to detect parameters that may be related to backdoor threats, isolate and neutralize harmful elements. These two masks are alternately optimized to remove potential vulnerabilities while retaining useful knowledge. Experimental results show that DAM has a significant improvement (2 - 10 percentage points) in reducing the attack success rate compared to existing merging methods, while sacrificing only about 1% of the accuracy. In summary, the main contribution of this paper lies in revealing the vulnerability of current multi - task merging methods under backdoor attacks and proposing an effective solution to balance performance and security.

Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace

BadMerging: Backdoor Attacks Against Model Merging

MARNet: Backdoor Attacks Against Cooperative Multi-Agent Reinforcement Learning

Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge

LoBAM: LoRA-Based Backdoor Attack on Model Merging

Neutralizing Backdoors through Information Conflicts for Large Language Models

Act in Collusion: A Persistent Distributed Multi-Target Backdoor in Federated Learning

Exploring the Vulnerability of Self-supervised Monocular Depth Estimation Models

Fusing Pruned and Backdoored Models: Optimal Transport-based Data-free Backdoor Mitigation

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

AdaMerging: Adaptive Model Merging for Multi-Task Learning

Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models

Merging by Matching Models in Task Parameter Subspaces

On Provable Backdoor Defense in Collaborative Learning

Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

A general approach to enhance the survivability of backdoor attacks by decision path coupling

Model-agnostic clean-label backdoor mitigation in cybersecurity environments

Adversarial Feature Map Pruning for Backdoor

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch