Abstract:Large Language Models (LLMs) are gaining momentum in software development with prompt-driven programming enabling developers to create code from natural language (NL) instructions. However, studies have questioned their ability to produce secure code and, thereby, the quality of prompt-generated software. Alongside, various prompting techniques that carefully tailor prompts have emerged to elicit optimal responses from LLMs. Still, the interplay between such prompting strategies and secure code generation remains under-explored and calls for further investigations. OBJECTIVE: In this study, we investigate the impact of different prompting techniques on the security of code generated from NL instructions by LLMs. METHOD: First we perform a systematic literature review to identify the existing prompting techniques that can be used for code generation tasks. A subset of these techniques are evaluated on GPT-3, GPT-3.5, and GPT-4 models for secure code generation. For this, we used an existing dataset consisting of 150 NL security-relevant code-generation prompts. RESULTS: Our work (i) classifies potential prompting techniques for code generation (ii) adapts and evaluates a subset of the identified techniques for secure code generation tasks and (iii) observes a reduction in security weaknesses across the tested LLMs, especially after using an existing technique called Recursive Criticism and Improvement (RCI), contributing valuable insights to the ongoing discourse on LLM-generated code security.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **the security issue of code generated by large language models (LLMs)**, specifically improving the security of code generated from natural - language instructions through different prompting techniques. ### Background and Motivation In recent years, large language models (LLMs) have been increasingly widely used in software development, especially the function of generating code through natural - language (NL) instructions. However, research has shown that the code generated by LLMs may have security vulnerabilities, making it difficult to guarantee its quality. Although some prompting techniques have been proposed to optimize the responses of LLMs, the impact of these techniques on generating secure code has not been fully studied. ### Research Objectives This research aims to explore the impact of different prompting techniques on the security of code generated by LLMs. Specifically, the author hopes to identify, through a systematic literature review, the prompting techniques applicable to code - generation tasks and evaluate the effectiveness of these techniques in improving code security. ### Methods 1. **Systematic Literature Review**: First, the author conducted a systematic literature review to identify existing prompting techniques that can be used for code - generation tasks. 2. **Experimental Evaluation**: Then, the author selected a part of the identified prompting techniques and carried out experimental evaluations on the GPT - 3, GPT - 3.5 and GPT - 4 models. The experiment used a dataset (LLMSecEval) containing 150 security - related natural - language code - generation prompts, and evaluated the security of the generated Python code through the static analysis tool Bandit. ### Main Findings - **Classification of Prompting Techniques**: The research classified potential prompting techniques. - **Adaptation and Evaluation**: A part of the identified prompting techniques were adapted and evaluated, especially their performance in secure - code - generation tasks. - **Reduction of Security Weaknesses**: It was observed that after using certain prompting techniques (such as Recursive Criticism and Improvement, RCI), the security weaknesses in the generated code were significantly reduced, especially more obvious in more advanced models (such as GPT - 4). ### Conclusions This research provides valuable insights into how to improve the security of code generated by LLMs through prompting techniques. In particular, the Recursive Criticism and Improvement (RCI) technique shows significant potential and can effectively reduce security vulnerabilities in the generated code. In addition, the research also reveals that introducing security specifications in prompts can change the coding behavior of the model, thus providing a direction for further optimizing prompting techniques. ### Formula Representation Since the content of this article mainly involves the fields of computer science and information security, no specific mathematical, physical or chemical formulas are involved. If it is necessary to further discuss specific algorithm or model details, relevant formulas can be presented in Markdown format. For example: - The process of Recursive Criticism and Improvement (RCI) can be represented by pseudo - code: ```markdown def RCI(prompt): while not is_secure(code := generate_code(prompt)): prompt = improve_prompt(prompt, feedback=analyze_security(code)) return code ``` This representation method can help readers better understand the working principle of the prompting technique.

Prompting Techniques for Secure Code Generation: A Systematic Investigation

From Solitary Directives to Interactive Encouragement! LLM Secure Code Generation by Natural Language Prompting

Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code

"You still have to study" -- On the Security of LLM generated code

PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)

Structured Chain-of-Thought Prompting for Code Generation

Ocassionally Secure: A Comparative Analysis of Code Generation Assistants

Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM

DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions

AceCoder : An Effective Prompting Technique Specialized in Code Generation

Promptly: Using Prompt Problems to Teach Learners How to Effectively Utilize AI Code Generators

Enhancing Computer Programming Education with LLMs: A Study on Effective Prompt Engineering for Python Code Generation

A Review of Repository Level Prompting for LLMs

Validating LLM-Generated Programs with Metamorphic Prompt Testing

Prompt Engineering or Fine Tuning: An Empirical Assessment of Large Language Models in Automated Software Engineering Tasks

Prompt Problems: A New Programming Exercise for the Generative AI Era

Syntactic Robustness for LLM-based Code Generation

LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations