Abstract:Despite the impressive performance of Large Language Models (LLMs) in software development activities, recent studies show the concern of introducing vulnerabilities into software codebase by AI programming assistants (e.g., Copilot, CodeWhisperer). In this work, we present Codexity, a security-focused code generation framework integrated with five LLMs. Codexity leverages the feedback of static analysis tools such as Infer and CppCheck to mitigate security vulnerabilities in LLM-generated programs. Our evaluation in a real-world benchmark with 751 automatically generated vulnerable subjects demonstrates Codexity can prevent 60% of the vulnerabilities being exposed to the software developer.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the security vulnerabilities that may be introduced in the automatically generated code when using large - language models (LLMs) for programming assistance. Although LLMs perform well in software development activities, recent research shows that these AI programming assistants (such as GitHub Copilot and CodeWhisperer) may introduce security vulnerabilities in the generated code, and developers may overlook these issues. This may pose a threat to the security of the entire software system. To address this challenge, the author proposes a security - focused code generation framework named Codexity. Codexity mitigates security vulnerabilities in programs generated by LLMs by integrating five LLMs and using the feedback from static analysis tools (such as Infer and CppCheck). Specifically, the workflow of Codexity is as follows: 1. **User selects repair strategy**: The user first needs to select a repair strategy in the configuration settings to activate the system. Codexity currently offers two strategies: Iteration Repair and Preshot Repair to meet different computational resource requirements. 2. **Generate initial code**: The user can call Codexity to complete their code. Codexity will generate an initial code completion based on the existing code snippets. 3. **Vulnerability detection**: The generated code will be routed to a series of static analysis tools for vulnerability detection. Codexity integrates two state - of - the - art static analyzers, CppCheck and Infer, to check for various types of vulnerabilities. 4. **Feedback and correction**: If the static analysis tools report any vulnerabilities, Codexity will extract the error / warning information, location information, and the vulnerable code to form a prompt containing vulnerability information. Then, Codexity will send this prompt to the LLM in the background and request to generate a vulnerability - free program. The paper experimentally evaluates the effectiveness of Codexity, and the results show that Codexity can prevent 60% of vulnerabilities from being exposed to software developers. In addition, the paper also compares the performance of Codexity with FootPatch and GitHub Copilot and explores the advantages and disadvantages of different repair strategies.

Codexity: Secure AI-assisted Code Generation

Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval

Ocassionally Secure: A Comparative Analysis of Code Generation Assistants

Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants

A Hazard Analysis Framework for Code Synthesis Large Language Models

How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models

HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data

Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing

CoSec: On-the-Fly Security Hardening of Code LLMs Via Supervised Co-Decoding

Large Language Models and Simple, Stupid Bugs

Artificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code Generation

SecCoder: Towards Generalizable and Robust Secure Code Generation

CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models

An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation

SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions

Software Vulnerability and Functionality Assessment using LLMs

From Solitary Directives to Interactive Encouragement! LLM Secure Code Generation by Natural Language Prompting