AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Chengyou Jia,Minnan Luo,Zhuohang Dang,Qiushi Sun,Fangzhi Xu,Junlin Hu,Tianbao Xie,Zhiyong Wu
2024-10-24
Abstract:Digital agents capable of automating complex computer tasks have attracted considerable attention due to their immense potential to enhance human-computer interaction. However, existing agent methods exhibit deficiencies in their generalization and specialization capabilities, especially in handling open-ended computer tasks in real-world environments. Inspired by the rich functionality of the App store, we present AgentStore, a scalable platform designed to dynamically integrate heterogeneous agents for automating computer tasks. AgentStore empowers users to integrate third-party agents, allowing the system to continuously enrich its capabilities and adapt to rapidly evolving operating systems. Additionally, we propose a novel core \textbf{MetaAgent} with the \textbf{AgentToken} strategy to efficiently manage diverse agents and utilize their specialized and generalist abilities for both domain-specific and system-wide tasks. Extensive experiments on three challenging benchmarks demonstrate that AgentStore surpasses the limitations of previous systems with narrow capabilities, particularly achieving a significant improvement from 11.21\% to 23.85\% on the OSWorld benchmark, more than doubling the previous results. Comprehensive quantitative and qualitative results further demonstrate AgentStore's ability to enhance agent systems in both generalization and specialization, underscoring its potential for developing the specialized generalist computer assistant. All our codes will be made publicly available in <a class="link-external link-https" href="https://chengyou-jia.github.io/AgentStore-Home" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the insufficient generalization and specialization capabilities exhibited by existing intelligent agents when handling complex computer tasks, especially open - ended tasks in real - world environments. Specifically: 1. **Limitations of Existing Agents**: - **Insufficient Generalization Ability**: Existing generalist agents can perform well in multiple tasks, but they do not perform well in tasks that require specific knowledge and operations. - **Insufficient Specialization Ability**: Specialized agents perform excellently within specific domains, but it is difficult for them to generalize across applications or in broader system environments. 2. **Challenges in Real - World Environments**: - There are various open - ended tasks in real - world operating system (OS) environments, and each task may require capabilities in different dimensions. For example, some tasks require specific knowledge and operations, and the performance of existing agents in this regard is not satisfactory. 3. **Requirement for Dynamic Adaptation**: - As operating systems continue to evolve and new applications emerge, the agent system needs to be able to dynamically integrate new agents to adapt to the rapidly changing environment. To solve these problems, the paper proposes the **AgentStore** platform, which aims to enhance the generalization and specialization capabilities of the agent system by dynamically integrating heterogeneous agents. AgentStore allows users to quickly integrate third - party agents, enabling the system to continuously enrich its functions and adapt to the evolving operating system. ### Key Innovation Points - **MetaAgent and AgentToken Strategy**: A MetaAgent based on a multimodal large - language model (MLLM) is introduced, combined with the AgentToken strategy, for efficient management and scheduling of a large number of heterogeneous agents. Each integrated agent is represented as a learnable token embedding in the MetaAgent architecture, thereby achieving dynamic management and task allocation. - **Automated Training Process**: An automated self - instruct training method is proposed, which enables AgentToken to be tuned without a large amount of manual data, further enhancing the practical applicability of the platform. Through these innovations, AgentStore has achieved a performance improvement significantly better than existing systems in multiple benchmark tests. In particular, in the OSWorld benchmark test, the success rate reached 23.85%, more than doubling that of the previous best system. ### Summary The core objective of this paper is to solve the deficiencies in the generalization and specialization capabilities of existing agents by constructing a platform that can dynamically integrate heterogeneous agents, thereby improving the performance of the agent system when handling complex computer tasks.