Abstract:Large Language Models (LLMs) have revolutionized software engineering (SE), demonstrating remarkable capabilities in various coding tasks. While recent efforts have produced autonomous software agents based on LLMs for end-to-end development tasks, these systems are typically designed for specific SE tasks. We introduce HyperAgent, a novel generalist multi-agent system designed to address a wide spectrum of SE tasks across different programming languages by mimicking human developers' workflows. Comprising four specialized agents - Planner, Navigator, Code Editor, and Executor. HyperAgent manages the full lifecycle of SE tasks, from initial conception to final verification. Through extensive evaluations, HyperAgent achieves state-of-the-art performance across diverse SE tasks: it attains a 25.01% success rate on SWE-Bench-Lite and 31.40% on SWE-Bench-Verified for GitHub issue resolution, surpassing existing methods. Furthermore, HyperAgent demonstrates SOTA performance in repository-level code generation (RepoExec), and in fault localization and program repair (Defects4J), often outperforming specialized systems. This work represents a significant advancement towards versatile, autonomous agents capable of handling complex, multi-step SE tasks across various domains and languages, potentially transforming AI-assisted software development practices.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of current software engineering (SE) task automation systems when dealing with complex, multi - step software development tasks. Specifically, the existing autonomous software agents based on large - language models (LLMs) can usually only handle specific SE tasks, lacking generality and adaptability, and are unable to seamlessly cope with the diverse challenges in multiple programming languages and different development scenarios. ### Problem Background In recent years, large - language models (LLMs) have demonstrated remarkable capabilities in the field of software engineering, especially in various coding tasks such as code generation, completion, error fixing, and refactoring. However, as the complexity of software engineering tasks increases, existing systems seem to be inadequate when dealing with complex tasks in the real world. Most of the existing autonomous software agents focus on specific SE tasks, such as solving GitHub issues or generating code, but their performance is limited when handling a wider range of SE tasks. ### Problems Proposed in the Paper To overcome these limitations, the paper introduces HyperAgent, a general multi - agent system designed to solve a wide range of SE tasks. HyperAgent imitates the typical workflow of human developers and designs four specialized agents: Planner, Navigator, Code Editor, and Executor. These agents work together to cover the entire life cycle of SE tasks, from the initial conception to the final verification. ### Specific Problem Description 1. **Generality**: Existing systems can usually only handle specific types of SE tasks, while HyperAgent aims to solve multiple SE tasks, including but not limited to: - Solving GitHub issues - Code generation - Error location and program repair 2. **Cross - language Support**: Existing systems are often limited to specific programming languages, while HyperAgent can handle tasks in different programming languages. 3. **Complex Task Handling**: Existing systems have difficulty handling complex, multi - step SE tasks, while HyperAgent can effectively handle complex tasks through a modular and extensible design. 4. **Performance Optimization**: Existing systems are less efficient when handling large - scale tasks, while HyperAgent improves the ability to handle complex tasks by optimizing inference costs and overall performance. ### Solution HyperAgent solves the above problems in the following ways: - **Multi - agent Architecture**: Each agent focuses on different aspects of SE tasks, making the system highly modular and adaptable. - **Central Coordination Mechanism**: The Planner agent serves as the central decision - making unit, responsible for task decomposition, sub - task assignment, and feedback processing. - **Asynchronous Communication Model**: Use a distributed message queue system (such as Redis) to achieve efficient parallel processing and load balancing. - **Light - weight LLM Summarizer**: Ensure the accuracy and integrity of information transfer and reduce information loss. Through these designs, HyperAgent can efficiently handle complex SE tasks in multiple programming languages and different development scenarios, providing a more general and powerful solution than existing systems.

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

Autonomous Agents in Software Development: A Vision Paper

Agentless: Demystifying LLM-based Software Engineering Agents

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

Harnessing Pre-trained Generalist Agents for Software Engineering Tasks

CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges

AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology

Agents in Software Engineering: Survey, Landscape, and Vision

ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

CodeAgent: Autonomous Communicative Agents for Code Review

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies

GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension

CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology

Agent S: An Open Agentic Framework that Uses Computers Like a Human

From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future

AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios

Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning