AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing

Ana Nunez,Nafis Tanveer Islam,Sumit Kumar Jha,Peyman Najafirad

2024-09-17

Abstract:Recent advancements in automatic code generation using large language models (LLMs) have brought us closer to fully automated secure software development. However, existing approaches often rely on a single agent for code generation, which struggles to produce secure, vulnerability-free code. Traditional program synthesis with LLMs has primarily focused on functional correctness, often neglecting critical dynamic security implications that happen during runtime. To address these challenges, we propose AutoSafeCoder, a multi-agent framework that leverages LLM-driven agents for code generation, vulnerability analysis, and security enhancement through continuous collaboration. The framework consists of three agents: a Coding Agent responsible for code generation, a Static Analyzer Agent identifying vulnerabilities, and a Fuzzing Agent performing dynamic testing using a mutation-based fuzzing approach to detect runtime errors. Our contribution focuses on ensuring the safety of multi-agent code generation by integrating dynamic and static testing in an iterative process during code generation by LLM that improves security. Experiments using the SecurityEval dataset demonstrate a 13% reduction in code vulnerabilities compared to baseline LLMs, with no compromise in functionality.

Software Engineering,Artificial Intelligence

What problem does this paper attempt to address?

The paper attempts to address the issue of security vulnerabilities when using large language models (LLMs) to automatically generate code. Although existing methods can generate functionally correct code, they often overlook runtime security issues, leading to potential security vulnerabilities in the generated code. Specifically, the paper points out that traditional program synthesis mainly focuses on functional correctness while ignoring dynamic security, which can have serious security implications at runtime. Therefore, this paper proposes a multi-agent framework, AutoSafeCoder, aimed at enhancing the security of LLM-generated code through static analysis and fuzz testing, ensuring that the code is not only functionally correct but also secure and free of vulnerabilities at runtime. The main contributions of the paper include: 1. **Introducing a new multi-agent system** that leverages large language models to generate autonomously secure code, combining feedback from static analysis and dynamic testing. 2. **Applying few-shot learning and contextual learning techniques**, enabling agents to effectively identify vulnerabilities in continuous feedback loops. 3. **Providing comprehensive evaluations**, demonstrating the efficiency and security of the collaborative code generation system, including both quantitative and qualitative assessment results. Through these methods, the paper aims to reduce security vulnerabilities in LLM-generated code, enhancing the security and reliability of software development. Experimental results show that compared to baseline LLMs, AutoSafeCoder can reduce code vulnerabilities by 13% without sacrificing functionality.

AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing

Codexity: Secure AI-assisted Code Generation

Code Security Vulnerability Repair Using Reinforcement Learning with Large Language Models

CoSec: On-the-Fly Security Hardening of Code LLMs Via Supervised Co-Decoding

Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

SecCoder: Towards Generalizable and Robust Secure Code Generation

Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis

SALLM: Security Assessment of Generated Code

LLM Security Guard for Code

Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval

LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward

Can LLMs Patch Security Issues?

Ocassionally Secure: A Comparative Analysis of Code Generation Assistants

An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation

HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data

CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models

Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code

Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization

From Solitary Directives to Interactive Encouragement! LLM Secure Code Generation by Natural Language Prompting

Fixing Security Vulnerabilities with AI in OSS-Fuzz

Evil Geniuses: Delving into the Safety of LLM-based Agents