Abstract:In the software engineering (SE) community, deep learning (DL) has recently been applied to many source code processing tasks, achieving state-of-the-art results. Due to the poor interpretability of DL models, their security vulnerabilities require scrutiny. Recently, researchers have identified an emergent security threat to DL models, namely poison attacks . The attackers aim to inject insidious backdoors into DL models by poisoning the training data with poison samples. The backdoors mean that poisoned models work normally with clean inputs but produce targeted erroneous results with inputs embedded with specific triggers. By using triggers to activate backdoors, attackers can manipulate poisoned models in security-related scenarios ( e.g., defect detection) and lead to severe consequences. To verify the vulnerability of deep source code processing models to poison attacks, we present a poison attack approach for source code named CodePoisoner as a strong imaginary enemy. CodePoisoner can produce compilable and functionality-preserving poison samples and effectively attack deep source code processing models by poisoning the training data with poison samples. To defend against poison attacks, we further propose an effective poison detection approach named CodeDetector . CodeDetector can automatically identify poison samples in the training data. We apply CodePoisoner and CodeDetector to six deep source code processing models, including defect detection, clone detection, and code repair models. The results show that 1 CodePoisoner conducts successful poison attacks with a high attack success rate (avg: 98.3%, max: 100%). It validates that existing deep source code processing models have a strong vulnerability to poison attacks. 2 CodeDetector effectively defends against multiple poison attack approaches by detecting (max: 100%) poison samples in the training data. We hope this work can help SE researchers and practitioners notice poison attacks and inspire the design of more advanced defense techniques.

Measuring Impacts of Poisoning on Model Parameters and Embeddings for Large Language Models of Code

Measuring Impacts of Poisoning on Model Parameters and Neuron Activations: A Case Study of Poisoning CodeBERT

Analyzing And Editing Inner Mechanisms Of Backdoored Language Models

PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning

Learning to Poison Large Language Models During Instruction Tuning

Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws

Composite Backdoor Attacks Against Large Language Models

The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs

Exposing Vulnerabilities in Clinical LLMs Through Data Poisoning Attacks: Case Study in Breast Cancer

Unveiling the Implicit Toxicity in Large Language Models

Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning

Poison as a Cure: Detecting & Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

Poison Attack and Defense on Deep Source Code Processing Models

Poison Attack and Poison Detection on Deep Source Code Processing Models

A Study of Backdoors in Instruction Fine-tuned Language Models

Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges

Persistent Pre-Training Poisoning of LLMs

Stealthy Backdoor Attack for Code Models

Systematic Evaluation of Backdoor Data Poisoning Attacks on Image Classifiers