AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

Dong Huang,Jie M.Zhang,Michael Luck,Qingwen Bu,Yuhao Qing,Heming Cui
2024-05-24
Abstract:The advancement of natural language processing (NLP) has been significantly boosted by the development of transformer-based large language models (LLMs). These models have revolutionized NLP tasks, particularly in code generation, aiding developers in creating software with enhanced efficiency. Despite their advancements, challenges in balancing code snippet generation with effective test case generation and execution persist. To address these issues, this paper introduces Multi-Agent Assistant Code Generation (AgentCoder), a novel solution comprising a multi-agent framework with specialized agents: the programmer agent, the test designer agent, and the test executor agent. During the coding procedure, the programmer agent will focus on the code generation and refinement based on the test executor agent's feedback. The test designer agent will generate test cases for the generated code, and the test executor agent will run the code with the test cases and write the feedback to the programmer. This collaborative system ensures robust code generation, surpassing the limitations of single-agent models and traditional methodologies. Our extensive experiments on 9 code generation models and 12 enhancement approaches showcase AgentCoder's superior performance over existing code generation models and prompt engineering techniques across various benchmarks. For example, AgentCoder (GPT-4) achieves 96.3\% and 91.8\% pass@1 in HumanEval and MBPP datasets with an overall token overhead of 56.9K and 66.3K, while state-of-the-art obtains only 90.2\% and 78.9\% pass@1 with an overall token overhead of 138.2K and 206.5K.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address two main issues in the process of code generation: 1. **Generation of Effective Test Cases**: Existing large language models (LLMs) can improve development efficiency when generating code, but they perform poorly in generating effective test cases. This results in the generated code being difficult to thoroughly test and verify. 2. **Resource Overhead in Multi-Agent Frameworks**: Current multi-agent collaboration frameworks perform well in code generation tasks, but having too many agents leads to high communication and coordination costs, which in turn affects overall performance. To tackle these issues, the paper proposes **AgentCoder**, a multi-agent framework consisting of three agents: - **Programmer Agent**: Responsible for generating code based on programming requirements. - **Test Designer Agent**: Independent of the code generation process, designs basic, edge, and large-scale test cases. - **Test Executor Agent**: Executes the test cases and returns feedback to the Programmer Agent for code optimization. The design goal of AgentCoder is to reduce communication overhead between agents while ensuring the objectivity and diversity of tests, thereby improving the overall efficiency and accuracy of code generation. Experimental results show that AgentCoder significantly outperforms existing single-agent and multi-agent methods on multiple benchmark datasets.