Abstract:We witness an increasing usage of AI-assistants even for routine (classroom) programming tasks. However, the code generated on basis of a so called "prompt" by the programmer does not always meet accepted security standards. On the one hand, this may be due to lack of best-practice examples in the training data. On the other hand, the actual quality of the programmers prompt appears to influence whether generated code contains weaknesses or not. In this paper we analyse 4 major LLMs with respect to the security of generated code. We do this on basis of a case study for the Python and Javascript language, using the MITRE CWE catalogue as the guiding security definition. Our results show that using different prompting techniques, some LLMs initially generate 65% code which is deemed insecure by a trained security engineer. On the other hand almost all analysed LLMs will eventually generate code being close to 100% secure with increasing manual guidance of a skilled engineer.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the deficiencies in the security of code generated by large language models (LLM). Specifically, the researchers are concerned with: 1. **Security issues of code generated by AI assistants**: Although AI assistants (such as GitHub Copilot, ChatGPT, etc.) can assist programmers in completing programming tasks, the generated code does not always meet security standards. This may be due to the lack of best - practice examples in the training data or the low - quality "prompts" provided by users. 2. **The impact of different prompt techniques on code security**: The researchers analyzed the security of code generated in Python and JavaScript languages by four major large language models (LLM), namely ChatGPT, Copilot, CodeWhisperer, and CodeLlama. They used the MITRE CWE catalogue as a guiding security definition to evaluate how different prompt techniques affect the security of the generated code. 3. **The role of manual guidance**: Research shows that through different prompt techniques and gradually increasing manual guidance, 65% of the code initially generated by some LLM was considered unsafe, but under the guidance of skilled engineers, almost all of the analyzed LLM can finally generate nearly 100% secure code. ### Formulas and Symbols Some key concepts and formulas involved in discussing code security can be represented in Markdown format as follows: - **CWE (Common Weakness Enumeration)**: A standard used to describe software security weaknesses. \[ CWE=\{w_1, w_2,\ldots, w_n\} \] - **Prevention of SQL injection attacks**: Use prepared statements to prevent SQL injection. \[ \text{Prepared Statement}=\text{SQL Query}+\text{Parameterized Inputs} \] ### Summary The core issue of this paper is to explore how to improve the security of code generated by LLM through improved prompt techniques. The research results show that through carefully designed prompts and gradually increasing human intervention, the security of code generated by LLM can be significantly improved.

"You still have to study" -- On the Security of LLM generated code

Prompting Techniques for Secure Code Generation: A Systematic Investigation

Ocassionally Secure: A Comparative Analysis of Code Generation Assistants

Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code

Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

SALLM: Security Assessment of Generated Code

From Solitary Directives to Interactive Encouragement! LLM Secure Code Generation by Natural Language Prompting

CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models

Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants

MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants

Can LLMs Patch Security Issues?

An Insight into Security Code Review with LLMs: Capabilities, Obstacles and Influential Factors

Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval

DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions

Artificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code Generation

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?

Security Attacks on LLM-based Code Completion Tools

Software Vulnerability and Functionality Assessment using LLMs

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models