LoBAM: LoRA-Based Backdoor Attack on Model Merging

Ming Yin,Jingyang Zhang,Jingwei Sun,Minghong Fang,Hai Li,Yiran Chen
2024-11-24
Abstract:Model merging is an emerging technique that integrates multiple models fine-tuned on different tasks to create a versatile model that excels in multiple domains. This scheme, in the meantime, may open up backdoor attack opportunities where one single malicious model can jeopardize the integrity of the merged model. Existing works try to demonstrate the risk of such attacks by assuming substantial computational resources, focusing on cases where the attacker can fully fine-tune the pre-trained model. Such an assumption, however, may not be feasible given the increasing size of machine learning models. In practice where resources are limited and the attacker can only employ techniques like Low-Rank Adaptation (LoRA) to produce the malicious model, it remains unclear whether the attack can still work and pose threats. In this work, we first identify that the attack efficacy is significantly diminished when using LoRA for fine-tuning. Then, we propose LoBAM, a method that yields high attack success rate with minimal training resources. The key idea of LoBAM is to amplify the malicious weights in an intelligent way that effectively enhances the attack efficacy. We demonstrate that our design can lead to improved attack success rate through both theoretical proof and extensive empirical experiments across various model merging scenarios. Moreover, we show that our method has strong stealthiness and is difficult to detect.
Cryptography and Security,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: In a low - resource environment, can a malicious model fine - tuned with LoRA (Low - Rank Adaptation) still pose an effective backdoor attack on the model merging process? Specifically, existing research shows that during the model merging process, malicious users can manipulate the behavior of the final merged model by uploading models with backdoors. However, most of these studies assume that the attacker has sufficient computing resources to perform full fine - tuning, which is not always feasible in practice. Therefore, when the attacker can only use a fine - tuning method with limited resources (such as LoRA), the effectiveness of existing attack methods drops significantly. To fill this research gap, the author proposes a new attack algorithm - LoBAM (LoRA - Based Backdoor Attack on Model Merging), which aims to optimize the weights of the malicious model so that efficient backdoor attacks can be achieved even in a low - resource environment. The key to LoBAM is to intelligently amplify the weights associated with the attack, thereby enhancing the attack effect and maintaining high stealth to avoid being detected. ### Main contributions of the paper: 1. **Reveal the limitations of existing attack methods**: In a low - resource environment (using LoRA for fine - tuning), existing attack methods are no longer effective. 2. **Propose a new attack method**: LoBAM, which can still effectively carry out backdoor attacks under resource - constrained conditions and is supported by strict mathematical proofs. 3. **Verify the effectiveness of the method through experiments**: Extensive experiments show that LoBAM performs well in multiple scenarios, with high attack success rates and stealth. ### Formula representation The formulas involved in the paper are as follows: - Parameter update formula after model merging: \[ \Delta \theta_{\text{merged}}=\text{Agg}(\Delta \theta_1, \Delta \theta_2,\ldots, \Delta \theta_n) \] \[ \theta_{\text{merged}}=\theta_{\text{pre}}+\Delta \theta_{\text{merged}} \] - Construction formula of LoBAM: \[ \theta_{\text{upload}}=\lambda(\theta_{\text{malicious}}-\theta_{\text{benign}})+\theta_{\text{benign}} \] - Theoretical analysis of attack success rate: \[ Y = \theta_{\text{pre}}+\frac{1}{N}\left(\sum_{i = 1, i\neq k}^N\Delta \theta_i+\Delta \theta'_k^m\right) \] \[ X=\theta_{\text{pre}}+\frac{1}{N}\left(\sum_{i = 1, i\neq k}^N\Delta \theta_i+\lambda(\Delta \theta'_k^m-\Delta \theta'_k^b)+\Delta \theta'_k^b\right) \] When \(\lambda>1+\frac{G}{\mu N\|\Delta \theta'_k^m-\Delta \theta'_k^b\|}\), we have \(g(X)>g(Y)\), where \(g\) represents the attack success rate. Through these formulas and detailed experimental results, the paper demonstrates the efficiency and stealth of LoBAM in a low - resource environment, providing a new perspective for the security research of model merging.