CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration

Xinming Hou,Mingming Yang,Wenxiang Jiao,Xing Wang,Zhaopeng Tu,Wayne Xin Zhao
2024-06-19
Abstract:Existing LLMs exhibit remarkable performance on various NLP tasks, but still struggle with complex real-world tasks, even equipped with advanced strategies like CoT and ReAct. In this work, we propose the CoAct framework, which transfers the hierarchical planning and collaboration patterns in human society to LLM systems. Specifically, our CoAct framework involves two agents: (1) A global planning agent, to comprehend the problem scope, formulate macro-level plans and provide detailed sub-task descriptions to local execution agents, which serves as the initial rendition of a global plan. (2) A local execution agent, to operate within the multi-tier task execution structure, focusing on detailed execution and implementation of specific tasks within the global plan. Experimental results on the WebArena benchmark show that CoAct can re-arrange the process trajectory when facing failures, and achieves superior performance over baseline methods on long-horizon web tasks. Code is available at <a class="link-external link-https" href="https://github.com/xmhou2002/CoAct" rel="external noopener nofollow">this https URL</a>.
Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the challenges faced by existing large language models (LLMs) when handling complex real-world tasks. Although current LLMs perform excellently in various natural language processing (NLP) tasks, they still encounter difficulties when dealing with complex real-world tasks, even with advanced strategies such as CoT and ReAct. Specifically, these models perform poorly in the following areas: 1. **Complex Planning and Collaboration Abilities**: Existing LLMs lack effective high-level planning and collaboration abilities, making it difficult to decompose and manage complex tasks efficiently. 2. **Adaptation to Errors and Uncertainty**: When faced with errors and uncertainty, existing LLMs struggle to adjust and replan, leading to task execution failures. 3. **Limitations of a Single Model**: Current research mainly focuses on a single LLM and a single memory stream, which limits the model's ability to handle complex tasks. To address these issues, the authors propose the CoAct framework, which applies hierarchical planning and collaboration patterns from human society to LLM systems. The CoAct framework includes two agents: 1. **Global Planning Agent**: Responsible for understanding the scope of the problem, formulating a macro plan, and providing detailed subtask descriptions to the local execution agent as the initial version of the global plan. 2. **Local Execution Agent**: Operates within a multi-level task execution structure, focusing on the detailed execution and implementation of specific tasks within the global plan. Through this hierarchical planning and collaboration mechanism, the CoAct framework aims to enhance the reasoning ability and adaptability of LLMs when handling complex real-world tasks. Experimental results show that CoAct performs excellently in the WebArena benchmark, capable of rearranging process trajectories in the face of failures and outperforming baseline methods in long-cycle web tasks.