TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation

Yaoxiang Wang,Zhiyong Wu,Junfeng Yao,Jinsong Su
2024-02-16
Abstract:The emergence of Large Language Models (LLMs) like ChatGPT has inspired the development of LLM-based agents capable of addressing complex, real-world tasks. However, these agents often struggle during task execution due to methodological constraints, such as error propagation and limited adaptability. To address this issue, we propose a multi-agent framework based on dynamic Task Decomposition and Agent Generation (TDAG). This framework dynamically decomposes complex tasks into smaller subtasks and assigns each to a specifically generated subagent, thereby enhancing adaptability in diverse and unpredictable real-world tasks. Simultaneously, existing benchmarks often lack the granularity needed to evaluate incremental progress in complex, multi-step tasks. In response, we introduce ItineraryBench in the context of travel planning, featuring interconnected, progressively complex tasks with a fine-grained evaluation system. ItineraryBench is designed to assess agents' abilities in memory, planning, and tool usage across tasks of varying complexity. Our experimental results reveal that TDAG significantly outperforms established baselines, showcasing its superior adaptability and context awareness in complex task scenarios.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address two main issues: 1. **Limitations of existing large language model (LLM) agents in performing complex real-world tasks**: - **Error propagation**: After task decomposition, if an early subtask fails, the error propagates, leading to the failure of the entire task. - **Limited adaptability**: Existing methods typically rely on manually constructed fixed sub-agents. This static design lacks generality and scalability, making it difficult to handle variable real-world tasks. 2. **Insufficiencies of existing benchmarks**: - **Lack of fine-grained evaluation metrics**: Current benchmarks often lack the ability to finely evaluate the step-by-step progress of complex, multi-step tasks, failing to accurately reflect the agent's performance in partial task completion. To address these issues, the paper proposes a multi-agent framework based on dynamic task decomposition and agent generation (TDAG) and introduces a new benchmark (ItineraryBench) to evaluate agents' performance in complex tasks such as travel planning. Specifically: - **TDAG Framework**: - **Dynamic task decomposition**: Dynamically decomposes complex tasks into smaller subtasks and adjusts subsequent subtasks in real-time based on the completion of previous subtasks. - **Agent generation**: Automatically generates specialized sub-agents for each subtask, equipped with specific skill sets to better adapt to changing environments. - **ItineraryBench Benchmark**: - **Fine-grained multi-faceted evaluation**: Goes beyond simple success/failure evaluation, employing a multi-level scoring system to assess agents' performance in task feasibility, constraint satisfaction, and time cost efficiency. - **Gradual task complexity**: Tasks are designed around travel planning, increasing in difficulty from simple to complex, simulating real-world problem-solving processes. - **Tool integration**: Integrates tools such as databases and Python interpreters to evaluate agents' ability to utilize external information. Through these innovations, the paper aims to enhance the adaptability and effectiveness of agents in handling complex real-world tasks and provide a more refined evaluation framework to measure agents' performance.