Abstract:Although powerful, the state‐of‐the‐art DL techniques for source code processing still suffer from adversarial vulnerability, where minor code perturbations can mislead a DL model's inference. We propose a black‐box effective adversarial attack method, CBA, leveraging the powerful large pre‐trained CodeBERT model to improve the quality of adversary, in this paper. Over the past few years, the software engineering (SE) community has widely employed deep learning (DL) techniques in many source code processing tasks. Similar to other domains like computer vision and natural language processing (NLP), the state‐of‐the‐art DL techniques for source code processing can still suffer from adversarial vulnerability, where minor code perturbations can mislead a DL model's inference. Efficiently detecting such vulnerability to expose the risks at an early stage is an essential step and of great importance for further enhancement. This paper proposes a novel black‐box effective and high‐quality adversarial attack method, namely CodeBERT‐Attack (CBA), based on the powerful large pre‐trained model (i.e., CodeBERT) for DL models of source code processing. CBA locates the vulnerable positions through masking and leverages the power of CodeBERT to generate textual preserving perturbations. We turn CodeBERT against DL models and further fine‐tuned CodeBERT models for specific downstream tasks, and successfully mislead these victim models to erroneous outputs. In addition, taking the power of CodeBERT, CBA is capable of effectively generating adversarial examples that are less perceptible to programmers. Our in‐depth evaluation on two typical source code classification tasks (i.e., functionality classification and code clone detection) against the most widely adopted LSTM and the powerful fine‐tuned CodeBERT models demonstrate the advantages of our proposed technique in terms of both effectiveness and efficiency. Furthermore, our results also show (1) that pre‐training may help CodeBERT gain resilience against perturbations further, and (2) certain pre‐training tasks may be beneficial for adversarial robustness.

How Robust Is a Large Pre-trained Language Model for Code Generationƒ A Case on Attacking GPT2

Transfer Attacks and Defenses for Large Language Models on Coding Tasks

Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT

Generating Natural Language Adversarial Examples on a Large Scale with Generative Models

What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models

CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming Language Models

On Evaluating Adversarial Robustness of Large Vision-Language Models

Comparing Robustness Against Adversarial Attacks in Code Generation: LLM-Generated vs. Human-Written

CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

ChatGPT-Generated Code Assignment Detection Using Perplexity of Large Language Models (Student Abstract)

Generating Valid and Natural Adversarial Examples with Large Language Models

How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models

CodeBERT‐Attack: Adversarial attack against source code deep learning models via pre‐trained model

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?

Goal-guided Generative Prompt Injection Attack on Large Language Models

Target-driven Attack for Large Language Models

PAL: Proxy-Guided Black-Box Attack on Large Language Models

Adversarial Attacks on Code Models with Discriminative Graph Patterns

Evaluating and Enhancing the Robustness of Code Pre-trained Models Through Structure-Aware Adversarial Samples Generation

Universal and Transferable Adversarial Attacks on Aligned Language Models