Abstract:Digital agents capable of automating complex computer tasks have attracted considerable attention due to their immense potential to enhance human-computer interaction. However, existing agent methods exhibit deficiencies in their generalization and specialization capabilities, especially in handling open-ended computer tasks in real-world environments. Inspired by the rich functionality of the App store, we present AgentStore, a scalable platform designed to dynamically integrate heterogeneous agents for automating computer tasks. AgentStore empowers users to integrate third-party agents, allowing the system to continuously enrich its capabilities and adapt to rapidly evolving operating systems. Additionally, we propose a novel core \textbf{MetaAgent} with the \textbf{AgentToken} strategy to efficiently manage diverse agents and utilize their specialized and generalist abilities for both domain-specific and system-wide tasks. Extensive experiments on three challenging benchmarks demonstrate that AgentStore surpasses the limitations of previous systems with narrow capabilities, particularly achieving a significant improvement from 11.21\% to 23.85\% on the OSWorld benchmark, more than doubling the previous results. Comprehensive quantitative and qualitative results further demonstrate AgentStore's ability to enhance agent systems in both generalization and specialization, underscoring its potential for developing the specialized generalist computer assistant. All our codes will be made publicly available in <a class="link-external link-https" href="https://chengyou-jia.github.io/AgentStore-Home" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the insufficient generalization and specialization capabilities exhibited by existing intelligent agents when handling complex computer tasks, especially open - ended tasks in real - world environments. Specifically: 1. **Limitations of Existing Agents**: - **Insufficient Generalization Ability**: Existing generalist agents can perform well in multiple tasks, but they do not perform well in tasks that require specific knowledge and operations. - **Insufficient Specialization Ability**: Specialized agents perform excellently within specific domains, but it is difficult for them to generalize across applications or in broader system environments. 2. **Challenges in Real - World Environments**: - There are various open - ended tasks in real - world operating system (OS) environments, and each task may require capabilities in different dimensions. For example, some tasks require specific knowledge and operations, and the performance of existing agents in this regard is not satisfactory. 3. **Requirement for Dynamic Adaptation**: - As operating systems continue to evolve and new applications emerge, the agent system needs to be able to dynamically integrate new agents to adapt to the rapidly changing environment. To solve these problems, the paper proposes the **AgentStore** platform, which aims to enhance the generalization and specialization capabilities of the agent system by dynamically integrating heterogeneous agents. AgentStore allows users to quickly integrate third - party agents, enabling the system to continuously enrich its functions and adapt to the evolving operating system. ### Key Innovation Points - **MetaAgent and AgentToken Strategy**: A MetaAgent based on a multimodal large - language model (MLLM) is introduced, combined with the AgentToken strategy, for efficient management and scheduling of a large number of heterogeneous agents. Each integrated agent is represented as a learnable token embedding in the MetaAgent architecture, thereby achieving dynamic management and task allocation. - **Automated Training Process**: An automated self - instruct training method is proposed, which enables AgentToken to be tuned without a large amount of manual data, further enhancing the practical applicability of the platform. Through these innovations, AgentStore has achieved a performance improvement significantly better than existing systems in multiple benchmark tests. In particular, in the OSWorld benchmark test, the success rate reached 23.85%, more than doubling that of the previous best system. ### Summary The core objective of this paper is to solve the deficiencies in the generalization and specialization capabilities of existing agents by constructing a platform that can dynamically integrate heterogeneous agents, thereby improving the performance of the agent system when handling complex computer tasks.

AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Agent S: An Open Agentic Framework that Uses Computers Like a Human

AgentStudio: A Toolkit for Building General Virtual Agents

PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World

AgentScope: A Flexible yet Robust Multi-Agent Platform

AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Agent Architecture and Collaboration for Supply Chain Management

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Agents: An Open-source Framework for Autonomous Language Agents

OpenAgents: An Open Platform for Language Agents in the Wild

GraphAgent: Agentic Graph Language Assistant

Very Large-Scale Multi-Agent Simulation in AgentScope

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

ProAgent: Building Proactive Cooperative Agents with Large Language Models

AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale