Prompting Techniques for Secure Code Generation: A Systematic Investigation

Catherine Tony,Nicolás E. Díaz Ferreyra,Markus Mutas,Salem Dhiff,Riccardo Scandariato
2024-07-10
Abstract:Large Language Models (LLMs) are gaining momentum in software development with prompt-driven programming enabling developers to create code from natural language (NL) instructions. However, studies have questioned their ability to produce secure code and, thereby, the quality of prompt-generated software. Alongside, various prompting techniques that carefully tailor prompts have emerged to elicit optimal responses from LLMs. Still, the interplay between such prompting strategies and secure code generation remains under-explored and calls for further investigations. OBJECTIVE: In this study, we investigate the impact of different prompting techniques on the security of code generated from NL instructions by LLMs. METHOD: First we perform a systematic literature review to identify the existing prompting techniques that can be used for code generation tasks. A subset of these techniques are evaluated on GPT-3, GPT-3.5, and GPT-4 models for secure code generation. For this, we used an existing dataset consisting of 150 NL security-relevant code-generation prompts. RESULTS: Our work (i) classifies potential prompting techniques for code generation (ii) adapts and evaluates a subset of the identified techniques for secure code generation tasks and (iii) observes a reduction in security weaknesses across the tested LLMs, especially after using an existing technique called Recursive Criticism and Improvement (RCI), contributing valuable insights to the ongoing discourse on LLM-generated code security.
Software Engineering,Artificial Intelligence,Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **the security issue of code generated by large language models (LLMs)**, specifically improving the security of code generated from natural - language instructions through different prompting techniques. ### Background and Motivation In recent years, large language models (LLMs) have been increasingly widely used in software development, especially the function of generating code through natural - language (NL) instructions. However, research has shown that the code generated by LLMs may have security vulnerabilities, making it difficult to guarantee its quality. Although some prompting techniques have been proposed to optimize the responses of LLMs, the impact of these techniques on generating secure code has not been fully studied. ### Research Objectives This research aims to explore the impact of different prompting techniques on the security of code generated by LLMs. Specifically, the author hopes to identify, through a systematic literature review, the prompting techniques applicable to code - generation tasks and evaluate the effectiveness of these techniques in improving code security. ### Methods 1. **Systematic Literature Review**: First, the author conducted a systematic literature review to identify existing prompting techniques that can be used for code - generation tasks. 2. **Experimental Evaluation**: Then, the author selected a part of the identified prompting techniques and carried out experimental evaluations on the GPT - 3, GPT - 3.5 and GPT - 4 models. The experiment used a dataset (LLMSecEval) containing 150 security - related natural - language code - generation prompts, and evaluated the security of the generated Python code through the static analysis tool Bandit. ### Main Findings - **Classification of Prompting Techniques**: The research classified potential prompting techniques. - **Adaptation and Evaluation**: A part of the identified prompting techniques were adapted and evaluated, especially their performance in secure - code - generation tasks. - **Reduction of Security Weaknesses**: It was observed that after using certain prompting techniques (such as Recursive Criticism and Improvement, RCI), the security weaknesses in the generated code were significantly reduced, especially more obvious in more advanced models (such as GPT - 4). ### Conclusions This research provides valuable insights into how to improve the security of code generated by LLMs through prompting techniques. In particular, the Recursive Criticism and Improvement (RCI) technique shows significant potential and can effectively reduce security vulnerabilities in the generated code. In addition, the research also reveals that introducing security specifications in prompts can change the coding behavior of the model, thus providing a direction for further optimizing prompting techniques. ### Formula Representation Since the content of this article mainly involves the fields of computer science and information security, no specific mathematical, physical or chemical formulas are involved. If it is necessary to further discuss specific algorithm or model details, relevant formulas can be presented in Markdown format. For example: - The process of Recursive Criticism and Improvement (RCI) can be represented by pseudo - code: ```markdown def RCI(prompt): while not is_secure(code := generate_code(prompt)): prompt = improve_prompt(prompt, feedback=analyze_security(code)) return code ``` This representation method can help readers better understand the working principle of the prompting technique.