Abstract:The emergence of Large Language Models (LLMs) like ChatGPT has inspired the development of LLM-based agents capable of addressing complex, real-world tasks. However, these agents often struggle during task execution due to methodological constraints, such as error propagation and limited adaptability. To address this issue, we propose a multi-agent framework based on dynamic Task Decomposition and Agent Generation (TDAG). This framework dynamically decomposes complex tasks into smaller subtasks and assigns each to a specifically generated subagent, thereby enhancing adaptability in diverse and unpredictable real-world tasks. Simultaneously, existing benchmarks often lack the granularity needed to evaluate incremental progress in complex, multi-step tasks. In response, we introduce ItineraryBench in the context of travel planning, featuring interconnected, progressively complex tasks with a fine-grained evaluation system. ItineraryBench is designed to assess agents' abilities in memory, planning, and tool usage across tasks of varying complexity. Our experimental results reveal that TDAG significantly outperforms established baselines, showcasing its superior adaptability and context awareness in complex task scenarios.

What problem does this paper attempt to address?

The paper attempts to address two main issues: 1. **Limitations of existing large language model (LLM) agents in performing complex real-world tasks**: - **Error propagation**: After task decomposition, if an early subtask fails, the error propagates, leading to the failure of the entire task. - **Limited adaptability**: Existing methods typically rely on manually constructed fixed sub-agents. This static design lacks generality and scalability, making it difficult to handle variable real-world tasks. 2. **Insufficiencies of existing benchmarks**: - **Lack of fine-grained evaluation metrics**: Current benchmarks often lack the ability to finely evaluate the step-by-step progress of complex, multi-step tasks, failing to accurately reflect the agent's performance in partial task completion. To address these issues, the paper proposes a multi-agent framework based on dynamic task decomposition and agent generation (TDAG) and introduces a new benchmark (ItineraryBench) to evaluate agents' performance in complex tasks such as travel planning. Specifically: - **TDAG Framework**: - **Dynamic task decomposition**: Dynamically decomposes complex tasks into smaller subtasks and adjusts subsequent subtasks in real-time based on the completion of previous subtasks. - **Agent generation**: Automatically generates specialized sub-agents for each subtask, equipped with specific skill sets to better adapt to changing environments. - **ItineraryBench Benchmark**: - **Fine-grained multi-faceted evaluation**: Goes beyond simple success/failure evaluation, employing a multi-level scoring system to assess agents' performance in task feasibility, constraint satisfaction, and time cost efficiency. - **Gradual task complexity**: Tasks are designed around travel planning, increasing in difficulty from simple to complex, simulating real-world problem-solving processes. - **Tool integration**: Integrates tools such as databases and Python interpreters to evaluate agents' ability to utilize external information. Through these innovations, the paper aims to enhance the adaptability and effectiveness of agents in handling complex real-world tasks and provide a more refined evaluation framework to measure agents' performance.

TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation

Adaptive In-conversation Team Building for Language Model Agents

A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration

Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning

TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage

BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents

AgentBench: Evaluating LLMs as Agents

AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents

MindAgent: Emergent Gaming Interaction

MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation

Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft

GTA: A Benchmark for General Tool Agents

MetaGPT: Meta Programming for Multi-Agent Collaborative Framework