Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

Ahmad Mohsin,Helge Janicke,Adrian Wood,Iqbal H. Sarker,Leandros Maglaras,Naeem Janjua
2024-06-18
Abstract:Large Language Models (LLMs) such as ChatGPT and GitHub Copilot have revolutionized automated code generation in software engineering. However, as these models are increasingly utilized for software development, concerns have arisen regarding the security and quality of the generated code. These concerns stem from LLMs being primarily trained on publicly available code repositories and internet-based textual data, which may contain insecure code. This presents a significant risk of perpetuating vulnerabilities in the generated code, creating potential attack vectors for exploitation by malicious actors. Our research aims to tackle these issues by introducing a framework for secure behavioral learning of LLMs through In-Content Learning (ICL) patterns during the code generation process, followed by rigorous security evaluations. To achieve this, we have selected four diverse LLMs for experimentation. We have evaluated these coding LLMs across three programming languages and identified security vulnerabilities and code smells. The code is generated through ICL with curated problem sets and undergoes rigorous security testing to evaluate the overall quality and trustworthiness of the generated code. Our research indicates that ICL-driven one-shot and few-shot learning patterns can enhance code security, reducing vulnerabilities in various programming scenarios. Developers and researchers should know that LLMs have a limited understanding of security principles. This may lead to security breaches when the generated code is deployed in production systems. Our research highlights LLMs are a potential source of new vulnerabilities to the software supply chain. It is important to consider this when using LLMs for code generation. This research article offers insights into improving LLM security and encourages proactive use of LLMs for code generation to ensure software system safety.
Cryptography and Security
What problem does this paper attempt to address?
This paper focuses on the security and quality issues that large language models (LLMs) may bring when generating code. With the popularity of tools like GitHub Copilot and ChatGPT, automatic code generation has become increasingly important in software engineering, but it has also raised concerns about the security and potential vulnerabilities of generated code. The paper presents a framework that enhances the security of LLMs and conducts rigorous security evaluations through "In-Context Learning" (ICL) training mode. The study points out that since LLMs are primarily trained on public code repositories and web text data, they may contain insecure code, which could propagate vulnerabilities in the generated code and become an attack vector for malicious attackers. The authors selected four different LLMs and conducted experiments on three programming languages (C++, C#, and Python) to identify security vulnerabilities and code defects. Through ICL combined with carefully designed problem sets, the generated code undergoes rigorous testing to evaluate its overall quality and reliability. The paper emphasizes that although the one-time learning and few-shot learning modes driven by ICL can improve code security, LLMs have limited understanding of security principles, which may lead to security risks when deployed in production systems. In addition, the study found that different types of LLMs (such as base models and fine-tuned models) exhibit differences in code generation behavior, which are not fully taken into account by existing evaluation methods. To address this research gap, the paper proposes a framework that utilizes ICL safety mode to enable LLMs to learn security knowledge during the code generation process and then undergo extensive code security testing. Experimental results show that this approach helps reduce vulnerabilities in different programming scenarios, but there are still issues with developers' incomplete understanding of LLMs' generated code and insufficient security knowledge. The main contributions of the paper include: 1. Providing a method for LLMs to learn security knowledge through ICL. 2. Creating diverse programming problem sets to generate a large number of code libraries. 3. Using four different LLMs to enhance their security learning behavior through ICL. 4. Conducting comprehensive security evaluations of the generated code, including Static Application Security Testing (SAST) and manual security reviews. 5. Developing a security instruction dataset for future LLM security research. The study also presents four specific research questions, involving the ability of different LLMs to generate secure code in various programming challenges, the effectiveness of ICL safety mode, the comparison between PDCGs and CCPs in generating secure code, and the remaining security risks after using the ICL mode. In summary, the paper aims to address how to trust the code generated by LLMs and its security issues, and proposes a framework to improve the code generation process of LLMs, reducing security risks in the software supply chain.