Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace

Jinluan Yang,Anke Tang,Didi Zhu,Zhengyu Chen,Li Shen,Fei Wu
2024-10-17
Abstract:Model merging has gained significant attention as a cost-effective approach to integrate multiple single-task fine-tuned models into a unified one that can perform well on multiple tasks. However, existing model merging techniques primarily focus on resolving conflicts between task-specific models, they often overlook potential security threats, particularly the risk of backdoor attacks in the open-source model ecosystem. In this paper, we first investigate the vulnerabilities of existing model merging methods to backdoor attacks, identifying two critical challenges: backdoor succession and backdoor transfer. To address these issues, we propose a novel Defense-Aware Merging (DAM) approach that simultaneously mitigates task interference and backdoor vulnerabilities. Specifically, DAM employs a meta-learning-based optimization method with dual masks to identify a shared and safety-aware subspace for model merging. These masks are alternately optimized: the Task-Shared mask identifies common beneficial parameters across tasks, aiming to preserve task-specific knowledge while reducing interference, while the Backdoor-Detection mask isolates potentially harmful parameters to neutralize security threats. This dual-mask design allows us to carefully balance the preservation of useful knowledge and the removal of potential vulnerabilities. Compared to existing merging methods, DAM achieves a more favorable balance between performance and security, reducing the attack success rate by 2-10 percentage points while sacrificing only about 1% in accuracy. Furthermore, DAM exhibits robust performance and broad applicability across various types of backdoor attacks and the number of compromised models involved in the merging process. We will release the codes and models soon.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the backdoor attack problem during the multi - task model merging process. Specifically, existing model merging techniques mainly focus on resolving conflicts between specific - task models, but often overlook potential security threats, especially the risk of backdoor attacks in the open - source model ecosystem. The two key challenges mentioned in the paper are: 1. **Backdoor Succession**: That is, one or more models with backdoors still have harmful elements in the merged model. 2. **Backdoor Transfer**: That is, the harmful elements in the models with backdoors will spread to clean models, affecting their security and performance. To address these challenges, the authors propose a new Defense - Aware Merging (DAM) algorithm, which simultaneously mitigates task interference and backdoor vulnerabilities by identifying shared and secure sub - spaces. DAM adopts a meta - learning optimization method and introduces two specialized masks: - **Task - Shared Mask**: Used to identify the common beneficial parameters among different tasks, aiming to preserve task - specific knowledge and reduce interference. - **Backdoor - Detection Mask**: Used to detect parameters that may be related to backdoor threats, isolate and neutralize harmful elements. These two masks are alternately optimized to remove potential vulnerabilities while retaining useful knowledge. Experimental results show that DAM has a significant improvement (2 - 10 percentage points) in reducing the attack success rate compared to existing merging methods, while sacrificing only about 1% of the accuracy. In summary, the main contribution of this paper lies in revealing the vulnerability of current multi - task merging methods under backdoor attacks and proposing an effective solution to balance performance and security.