Abstract:Large Language Models (LLMs) have shown incredible potential in code generation tasks, and recent research in prompt engineering have enhanced LLMs' understanding of textual information. However, ensuring the accuracy of generated code often requires extensive testing and validation by programmers. While LLMs can typically generate code based on task descriptions, their accuracy remains limited, especially for complex tasks that require a deeper understanding of both the problem statement and the code generation process. This limitation is primarily due to the LLMs' need to simultaneously comprehend text and generate syntactically and semantically correct code, without having the capability to automatically refine the code. In real-world software development, programmers rarely produce flawless code in a single attempt based on the task description alone, they rely on iterative feedback and debugging to refine their programs. Inspired by this process, we introduce a novel architecture of LLM-based agents for code generation and automatic debugging: Refinement and Guidance Debugging (RGD). The RGD framework is a multi-LLM-based agent debugger that leverages three distinct LLM agents-Guide Agent, Debug Agent, and Feedback Agent. RGD decomposes the code generation task into multiple steps, ensuring a clearer workflow and enabling iterative code refinement based on self-reflection and feedback. Experimental results demonstrate that RGD exhibits remarkable code generation capabilities, achieving state-of-the-art performance with a 9.8% improvement on the HumanEval dataset and a 16.2% improvement on the MBPP dataset compared to the state-of-the-art approaches and traditional direct prompting approaches. We highlight the effectiveness of the RGD framework in enhancing LLMs' ability to generate and refine code autonomously.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the accuracy and reliability issues of large - language models (LLMs) in code - generation tasks. Although LLMs perform well in generating code based on text - based task descriptions, the generated code often requires extensive testing and verification by programmers. Especially when dealing with complex tasks, it is difficult for LLMs to simultaneously understand the task description and generate syntactically and semantically correct code. In addition, although the existing multi - round code - generation frameworks have improved code quality through iterative generation and debugging, there are still some limitations in practical applications, such as the inability to effectively use failed test cases for self - repair and excessive dependence on task descriptions. To this end, the paper proposes a new architecture - RGD (Refinement and Guidance Debugging), which improves the quality of code generation through the collaborative work of multiple LLM agents. The RGD framework contains three specialized LLM agents: the Guide Agent, the Debug Agent, and the Feedback Agent. These agents are respectively responsible for generating generation guidelines, initial code generation and debugging, and failure analysis and correction suggestions based on execution results. Through this phased workflow and iterative code optimization based on introspection and feedback, RGD aims to improve the capabilities of LLMs in code generation and automatic debugging, especially their performance when dealing with complex tasks. The experimental results show that RGD has achieved significant performance improvements on multiple benchmark datasets. For example, its performance on the HumanEval dataset is 9.8% better than the current state - of - the - art method, and the improvement on the MBPP dataset has reached 16.2%. This indicates that the RGD framework can effectively enhance the ability of LLMs to generate high - quality code and performs well when dealing with challenging programming tasks.

RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance

Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement

Training LLMs to Better Self-Debug and Explain Code

LDB: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation

Fine-grained LLM Agent: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback

A Unified Debugging Approach via LLM-Based Multi-Agent Synergy

An Evaluation-Driven Approach to Designing LLM Agents: Process and Architecture

Teaching Large Language Models to Self-Debug

How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models

Effective Large Language Model Debugging with Best-first Tree Search

Steering Large Language Models between Code Execution and Textual Reasoning

Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation

Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation

CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement

Fixing Code Generation Errors for Large Language Models

CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges

Automatic Robotic Development through Collaborative Framework by Large Language Models

ProgAI: Enhancing Code Generation with LLMs For Real World Challenges