CARL: Unsupervised Code-Based Adversarial Attacks for Programming Language Models via Reinforcement Learning
Kaichun Yao,Hao Wang,Chuan Qin,Hengshu Zhu,Yanjun Wu,Libo Zhang
DOI: https://doi.org/10.1145/3688839
IF: 3.685
2024-01-01
ACM Transactions on Software Engineering and Methodology
Abstract:Code based adversarial attacks play a crucial role in revealing vulnerabilities of software system. Recently, pre-trained programming language models (PLMs) have demonstrated remarkable success in various significant software engineering tasks, progressively transforming the paradigm of software development. Despite their impressive capabilities, these powerful models are vulnerable to adversarial attacks. Therefore, it is necessary to carefully investigate the robustness and vulnerabilities of the PLMs by means of adversarial attacks. Adversarial attacks entail imperceptible input modifications that cause target models to make incorrect predictions. Existing approaches for attacking PLMs often employ either identifier renaming or the greedy algorithm, which may yield sub-optimal performance or lead to high inference times. In response to these limitations, we propose CARL, an unsupervised black-box attack model that leverages reinforcement learning to generate imperceptible adversarial examples. Specifically, CARL comprises a programming language encoder and a perturbation prediction layer. In order to achieve more effective and efficient attack, we cast the task as a sequence decision-making process, optimizing through policy gradient with a suite of reward functions. We conduct extensive experiments to validate the effectiveness of CARL on code summarization, code translation, and code refinement tasks, covering various programming languages and PLMs. The experimental results demonstrate that CARL surpasses state-of-the-art code attack models, achieving the highest attack success rate across multiple tasks and PLMs while maintaining high attack efficiency, imperceptibility, consistency, and fluency.