Abstract:Existing LLMs exhibit remarkable performance on various NLP tasks, but still struggle with complex real-world tasks, even equipped with advanced strategies like CoT and ReAct. In this work, we propose the CoAct framework, which transfers the hierarchical planning and collaboration patterns in human society to LLM systems. Specifically, our CoAct framework involves two agents: (1) A global planning agent, to comprehend the problem scope, formulate macro-level plans and provide detailed sub-task descriptions to local execution agents, which serves as the initial rendition of a global plan. (2) A local execution agent, to operate within the multi-tier task execution structure, focusing on detailed execution and implementation of specific tasks within the global plan. Experimental results on the WebArena benchmark show that CoAct can re-arrange the process trajectory when facing failures, and achieves superior performance over baseline methods on long-horizon web tasks. Code is available at <a class="link-external link-https" href="https://github.com/xmhou2002/CoAct" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the challenges faced by existing large language models (LLMs) when handling complex real-world tasks. Although current LLMs perform excellently in various natural language processing (NLP) tasks, they still encounter difficulties when dealing with complex real-world tasks, even with advanced strategies such as CoT and ReAct. Specifically, these models perform poorly in the following areas: 1. **Complex Planning and Collaboration Abilities**: Existing LLMs lack effective high-level planning and collaboration abilities, making it difficult to decompose and manage complex tasks efficiently. 2. **Adaptation to Errors and Uncertainty**: When faced with errors and uncertainty, existing LLMs struggle to adjust and replan, leading to task execution failures. 3. **Limitations of a Single Model**: Current research mainly focuses on a single LLM and a single memory stream, which limits the model's ability to handle complex tasks. To address these issues, the authors propose the CoAct framework, which applies hierarchical planning and collaboration patterns from human society to LLM systems. The CoAct framework includes two agents: 1. **Global Planning Agent**: Responsible for understanding the scope of the problem, formulating a macro plan, and providing detailed subtask descriptions to the local execution agent as the initial version of the global plan. 2. **Local Execution Agent**: Operates within a multi-level task execution structure, focusing on the detailed execution and implementation of specific tasks within the global plan. Through this hierarchical planning and collaboration mechanism, the CoAct framework aims to enhance the reasoning ability and adaptability of LLMs when handling complex real-world tasks. Experimental results show that CoAct performs excellently in the WebArena benchmark, capable of rearranging process trajectories in the face of failures and outperforming baseline methods in long-cycle web tasks.

CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration

Learning Intra-group Cooperation in Multi-agent Systems.

Building Cooperative Embodied Agents Modularly with Large Language Models

COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models

Coordinating Multi-Agent Reinforcement Learning Via Dual Collaborative Constraints

AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration

MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents

Nl2Hltl2Plan: Scaling Up Natural Language Understanding for Multi-Robots Through Hierarchical Temporal Logic Task Representation

A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration

Embodied LLM Agents Learn to Cooperate in Organized Teams

LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner

Executable Code Actions Elicit Better LLM Agents

Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game

CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

Chain of Agents: Large Language Models Collaborating on Long-Context Tasks

Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models

Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

Leveraging Large Language Model for Heterogeneous Ad Hoc Teamwork Collaboration

Learning to Use Tools via Cooperative and Interactive Agents