CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification

Fangwen Mu,Junjie Wang,Zhuohao Yu,Lin Shi,Song Wang,Mingyang Li,Qing Wang
2024-10-26
Abstract:Neural code models have found widespread success in tasks pertaining to code intelligence, yet they are vulnerable to backdoor attacks, where an adversary can manipulate the victim model's behavior by inserting triggers into the source code. Recent studies indicate that advanced backdoor attacks can achieve nearly 100% attack success rates on many software engineering tasks. However, effective defense techniques against such attacks remain insufficiently explored. In this study, we propose CodePurify, a novel defense against backdoor attacks on code models through entropy-based purification. Entropy-based purification involves the process of precisely detecting and eliminating the possible triggers in the source code while preserving its semantic information. Within this process, CodePurify first develops a confidence-driven entropy-based measurement to determine whether a code snippet is poisoned and, if so, locates the triggers. Subsequently, it purifies the code by substituting the triggers with benign tokens using a masked language model. We extensively evaluate CodePurify against four advanced backdoor attacks across three representative tasks and two popular code models. The results show that CodePurify significantly outperforms four commonly used defense baselines, improving average defense performance by at least 40%, 40%, and 12% across the three tasks, respectively. These findings highlight the potential of CodePurify to serve as a robust defense against backdoor attacks on neural code models.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of backdoor attacks faced by neural code models. Specifically: 1. **Background and Challenges**: - Neural code models perform well in various software engineering tasks, such as defect detection, program repair, and code generation. - However, these models are vulnerable to backdoor attacks. Backdoor attackers can insert specific triggers into the training data, causing the model to produce maliciously - specified outputs when encountering trigger inputs while performing normally under normal inputs. - The success rate of such attacks can reach nearly 100%, and it is almost impossible to distinguish a model with a backdoor from a benign model unless a trigger input is encountered. 2. **Insufficiencies of Existing Defense Methods**: - Although there are some defense methods against backdoor attacks (such as CodeDetector and OSeqL), they have limited protection against evolving backdoor attacks. - Existing defense methods are not effective in dealing with different types of backdoor attacks. For example, they are effective against dead - code - insertion attacks but perform poorly in other types of attacks (such as identifier renaming). 3. **Research Objectives**: - Propose a new defense method - **CodePurify**, which detects and eliminates triggers in source code through an entropy - based purification strategy, thereby protecting neural code models from the influence of backdoor attacks. - The goals of CodePurify are: - **Accurately Detect and Locate Triggers**: Use a confidence - driven entropy measurement method to determine whether a code fragment is contaminated and accurately locate the trigger. - **Purify the Code**: Use a masked language model to replace the trigger with harmless code elements, generate purified code while maintaining the integrity of semantic information. 4. **Experimental Verification**: - The authors conducted extensive evaluations on four advanced backdoor attack methods, three representative tasks (defect detection, clone detection, and program repair), and two popular code models (CodeBERT and CodeT5). - The results show that CodePurify significantly outperforms four commonly - used defense baselines, with an average defense performance improvement of at least 40%, 40%, and 12% on the three tasks respectively. In conclusion, this paper proposes a novel and effective defense mechanism to deal with backdoor attacks in neural code models and ensure the security and reliability of the models.