Abstract:Neural code models have found widespread success in tasks pertaining to code intelligence, yet they are vulnerable to backdoor attacks, where an adversary can manipulate the victim model's behavior by inserting triggers into the source code. Recent studies indicate that advanced backdoor attacks can achieve nearly 100% attack success rates on many software engineering tasks. However, effective defense techniques against such attacks remain insufficiently explored. In this study, we propose CodePurify, a novel defense against backdoor attacks on code models through entropy-based purification. Entropy-based purification involves the process of precisely detecting and eliminating the possible triggers in the source code while preserving its semantic information. Within this process, CodePurify first develops a confidence-driven entropy-based measurement to determine whether a code snippet is poisoned and, if so, locates the triggers. Subsequently, it purifies the code by substituting the triggers with benign tokens using a masked language model. We extensively evaluate CodePurify against four advanced backdoor attacks across three representative tasks and two popular code models. The results show that CodePurify significantly outperforms four commonly used defense baselines, improving average defense performance by at least 40%, 40%, and 12% across the three tasks, respectively. These findings highlight the potential of CodePurify to serve as a robust defense against backdoor attacks on neural code models.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of backdoor attacks faced by neural code models. Specifically: 1. **Background and Challenges**: - Neural code models perform well in various software engineering tasks, such as defect detection, program repair, and code generation. - However, these models are vulnerable to backdoor attacks. Backdoor attackers can insert specific triggers into the training data, causing the model to produce maliciously - specified outputs when encountering trigger inputs while performing normally under normal inputs. - The success rate of such attacks can reach nearly 100%, and it is almost impossible to distinguish a model with a backdoor from a benign model unless a trigger input is encountered. 2. **Insufficiencies of Existing Defense Methods**: - Although there are some defense methods against backdoor attacks (such as CodeDetector and OSeqL), they have limited protection against evolving backdoor attacks. - Existing defense methods are not effective in dealing with different types of backdoor attacks. For example, they are effective against dead - code - insertion attacks but perform poorly in other types of attacks (such as identifier renaming). 3. **Research Objectives**: - Propose a new defense method - **CodePurify**, which detects and eliminates triggers in source code through an entropy - based purification strategy, thereby protecting neural code models from the influence of backdoor attacks. - The goals of CodePurify are: - **Accurately Detect and Locate Triggers**: Use a confidence - driven entropy measurement method to determine whether a code fragment is contaminated and accurately locate the trigger. - **Purify the Code**: Use a masked language model to replace the trigger with harmless code elements, generate purified code while maintaining the integrity of semantic information. 4. **Experimental Verification**: - The authors conducted extensive evaluations on four advanced backdoor attack methods, three representative tasks (defect detection, clone detection, and program repair), and two popular code models (CodeBERT and CodeT5). - The results show that CodePurify significantly outperforms four commonly - used defense baselines, with an average defense performance improvement of at least 40%, 40%, and 12% on the three tasks respectively. In conclusion, this paper proposes a novel and effective defense mechanism to deal with backdoor attacks in neural code models and ensure the security and reliability of the models.

CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification

Redeem Myself: Purifying Backdoors in Deep Learning Models Using Self Attention Distillation.

Eliminating Backdoors in Neural Code Models via Trigger Inversion

Stealthy Backdoor Attack for Code Models

Backdooring Neural Code Search

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Poison Attack and Defense on Deep Source Code Processing Models

PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks

PBP: Post-training Backdoor Purification for Malware Classifiers

Expose Backdoors on the Way: A Feature-Based Efficient Defense Against Textual Backdoor Attacks

Augmented Neural Fine-Tuning for Efficient Backdoor Purification

BadActs: A Universal Backdoor Defense in the Activation Space

Poison as a Cure: Detecting & Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

SABER: Model-agnostic Backdoor Attack on Chain-of-Thought in Neural Code Generation

Towards Stable Backdoor Purification through Feature Shift Tuning

Adversarial Neuron Pruning Purifies Backdoored Deep Models

Defense against Backdoor Attack on Pre-trained Language Models via Head Pruning and Attention Normalization

Multi-target Backdoor Attacks for Code Pre-trained Models