Abstract:Large Language Models (LLMs) such as ChatGPT and GitHub Copilot have revolutionized automated code generation in software engineering. However, as these models are increasingly utilized for software development, concerns have arisen regarding the security and quality of the generated code. These concerns stem from LLMs being primarily trained on publicly available code repositories and internet-based textual data, which may contain insecure code. This presents a significant risk of perpetuating vulnerabilities in the generated code, creating potential attack vectors for exploitation by malicious actors. Our research aims to tackle these issues by introducing a framework for secure behavioral learning of LLMs through In-Content Learning (ICL) patterns during the code generation process, followed by rigorous security evaluations. To achieve this, we have selected four diverse LLMs for experimentation. We have evaluated these coding LLMs across three programming languages and identified security vulnerabilities and code smells. The code is generated through ICL with curated problem sets and undergoes rigorous security testing to evaluate the overall quality and trustworthiness of the generated code. Our research indicates that ICL-driven one-shot and few-shot learning patterns can enhance code security, reducing vulnerabilities in various programming scenarios. Developers and researchers should know that LLMs have a limited understanding of security principles. This may lead to security breaches when the generated code is deployed in production systems. Our research highlights LLMs are a potential source of new vulnerabilities to the software supply chain. It is important to consider this when using LLMs for code generation. This research article offers insights into improving LLM security and encourages proactive use of LLMs for code generation to ensure software system safety.

Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants

Codexity: Secure AI-assisted Code Generation

Large Language Models and Simple, Stupid Bugs

"You still have to study" -- On the Security of LLM generated code

Large Language Models for Secure Code Assessment: A Multi-Language Empirical Study

Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

Security Attacks on LLM-based Code Completion Tools

Do Users Write More Insecure Code with AI Assistants?

Ocassionally Secure: A Comparative Analysis of Code Generation Assistants

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval

How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models

CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

An Insight into Security Code Review with LLMs: Capabilities, Obstacles and Influential Factors

Software Vulnerability and Functionality Assessment using LLMs

CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Assessing Cybersecurity Vulnerabilities in Code Large Language Models

What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models

The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models

CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models