Is Watermarking LLM-Generated Code Robust?

Tarun Suresh,Shubham Ugare,Gagandeep Singh,Sasa Misailovic
2024-06-29
Abstract:We present the first study of the robustness of existing watermarking techniques on Python code generated by large language models. Although existing works showed that watermarking can be robust for natural language, we show that it is easy to remove these watermarks on code by semantic-preserving transformations.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Are the existing watermarking techniques robust enough in Python code generated by large - language models (LLMs)? Specifically, the author is concerned with whether these watermarks can still be detected after the code undergoes semantically - preserving transformations (such as renaming variables, inserting useless code, etc.). ### Background and Motivation With the continuous improvement of the capabilities of large - language models (such as GPT and Codex) in understanding and generating code, their potential for application in software engineering is huge. However, this also raises some problems, such as code plagiarism and malicious code generation. To solve these problems, researchers have developed various watermarking techniques to mark the code generated by LLMs by embedding hidden patterns in the generated code. ### Research Questions Although previous studies have shown that watermarking techniques are relatively robust for natural languages, this paper points out that in code, these watermarks are easily removed. The characteristics of code make it easier to modify it, for example: - Changing a part of the program (such as renaming a variable) may affect the entire program. - Semantically - preserving modifications (such as adding useless code or obfuscating code) do not change the behavior of the program, so that an attacker can easily make significant changes to the code without affecting its quality, thereby reducing the detectability of the watermark. ### Main Contributions 1. **First - time research**: This is the first systematic study on the robustness of existing watermarking techniques in Python code generated by LLMs. 2. **Algorithm proposal**: The author proposes an algorithm based on the abstract syntax tree (AST), which can randomly apply semantically - preserving program modifications and evaluate the impact of these modifications on watermark detection. 3. **Experimental verification**: Through a series of experiments, the author shows that even simple modifications (such as inserting print statements and renaming variables) will significantly reduce the true positive rate (TPR) of the watermark. More complex modifications (such as adding useless code and wrapping try - catch blocks) will further significantly reduce the TPR. ### Conclusions This study shows that the existing watermarking techniques are not robust enough in code generated by LLMs. The author calls for future research to be dedicated to developing more robust detection schemes to ensure the quality, security, and reliability of code generated by LLMs. ### Formula Representation Some formulas involved in the paper are as follows: - **UMD watermark detection**: \[ z=\frac{2(|x|_G - T/2)}{\sqrt{T}} \] where $|x|_G$ is the number of tokens in the green list and $T$ is the length of the generated text. - **Unigram watermark detection**: \[ z = \frac{|x|_G-\gamma T}{\sqrt{T\gamma(1 - \gamma)}} \] where $\gamma$ is the proportion of the vocabulary included in the green list. These formulas are used to evaluate the presence or absence of watermarks.