Abstract:Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of ``early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to be bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks. To verify the effectiveness of our defense method, we select code synthesis tasks as our experimental scenarios. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE in enhancing the security of CLMs.

DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders

GhostEncoder: Stealthy Backdoor Attacks with Dynamic Triggers to Pre-trained Encoders in Self-supervised Learning

Deep Neural Backdoor in Semi-Supervised Learning: Threats and Countermeasures

Mitigating Backdoor Attacks in Pre-Trained Encoders via Self-Supervised Knowledge Distillation

SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning

Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders

DeHiB: Deep Hidden Backdoor Attack on Semi-supervised Learning Via Adversarial Perturbation.

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks

Towards Imperceptible Backdoor Attack in Self-supervised Learning

An Embarrassingly Simple Backdoor Attack on Self-supervised Learning

EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection

Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking

SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders

DeepDefense: A Steganalysis-Based Backdoor Detecting and Mitigating Protocol in Deep Neural Networks for AI Security

Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via Contrastive Learning

BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning

Enhanced Coalescence Backdoor Attack Against DNN Based on Pixel Gradient

De-Confounded Variational Encoder-Decoder for Logical Table-to-Text Generation.

SSL-OTA: Unveiling Backdoor Threats in Self-Supervised Learning for Object Detection

How to Craft Backdoors with Unlabeled Data Alone?