Huy Nhat Phan,Phong X. Nguyen,Nghi D. Q. Bui
Abstract:Large Language Models (LLMs) have revolutionized software engineering (SE), demonstrating remarkable capabilities in various coding tasks. While recent efforts have produced autonomous software agents based on LLMs for end-to-end development tasks, these systems are typically designed for specific SE tasks. We introduce HyperAgent, a novel generalist multi-agent system designed to address a wide spectrum of SE tasks across different programming languages by mimicking human developers' workflows. Comprising four specialized agents - Planner, Navigator, Code Editor, and Executor. HyperAgent manages the full lifecycle of SE tasks, from initial conception to final verification. Through extensive evaluations, HyperAgent achieves state-of-the-art performance across diverse SE tasks: it attains a 25.01% success rate on SWE-Bench-Lite and 31.40% on SWE-Bench-Verified for GitHub issue resolution, surpassing existing methods. Furthermore, HyperAgent demonstrates SOTA performance in repository-level code generation (RepoExec), and in fault localization and program repair (Defects4J), often outperforming specialized systems. This work represents a significant advancement towards versatile, autonomous agents capable of handling complex, multi-step SE tasks across various domains and languages, potentially transforming AI-assisted software development practices.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of current software engineering (SE) task automation systems when dealing with complex, multi - step software development tasks. Specifically, the existing autonomous software agents based on large - language models (LLMs) can usually only handle specific SE tasks, lacking generality and adaptability, and are unable to seamlessly cope with the diverse challenges in multiple programming languages and different development scenarios.
### Problem Background
In recent years, large - language models (LLMs) have demonstrated remarkable capabilities in the field of software engineering, especially in various coding tasks such as code generation, completion, error fixing, and refactoring. However, as the complexity of software engineering tasks increases, existing systems seem to be inadequate when dealing with complex tasks in the real world. Most of the existing autonomous software agents focus on specific SE tasks, such as solving GitHub issues or generating code, but their performance is limited when handling a wider range of SE tasks.
### Problems Proposed in the Paper
To overcome these limitations, the paper introduces HyperAgent, a general multi - agent system designed to solve a wide range of SE tasks. HyperAgent imitates the typical workflow of human developers and designs four specialized agents: Planner, Navigator, Code Editor, and Executor. These agents work together to cover the entire life cycle of SE tasks, from the initial conception to the final verification.
### Specific Problem Description
1. **Generality**: Existing systems can usually only handle specific types of SE tasks, while HyperAgent aims to solve multiple SE tasks, including but not limited to:
- Solving GitHub issues
- Code generation
- Error location and program repair
2. **Cross - language Support**: Existing systems are often limited to specific programming languages, while HyperAgent can handle tasks in different programming languages.
3. **Complex Task Handling**: Existing systems have difficulty handling complex, multi - step SE tasks, while HyperAgent can effectively handle complex tasks through a modular and extensible design.
4. **Performance Optimization**: Existing systems are less efficient when handling large - scale tasks, while HyperAgent improves the ability to handle complex tasks by optimizing inference costs and overall performance.
### Solution
HyperAgent solves the above problems in the following ways:
- **Multi - agent Architecture**: Each agent focuses on different aspects of SE tasks, making the system highly modular and adaptable.
- **Central Coordination Mechanism**: The Planner agent serves as the central decision - making unit, responsible for task decomposition, sub - task assignment, and feedback processing.
- **Asynchronous Communication Model**: Use a distributed message queue system (such as Redis) to achieve efficient parallel processing and load balancing.
- **Light - weight LLM Summarizer**: Ensure the accuracy and integrity of information transfer and reduce information loss.
Through these designs, HyperAgent can efficiently handle complex SE tasks in multiple programming languages and different development scenarios, providing a more general and powerful solution than existing systems.