TaskLAMA: Probing the Complex Task Understanding of Language Models

Quan Yuan,Mehran Kazemi,Xin Xu,Isaac Noble,Vaiva Imbrasaite,Deepak Ramachandran

DOI: https://doi.org/10.48550/arXiv.2308.15299

2023-08-29

Abstract:Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute to achieving the task, with edges specifying temporal dependencies between them. SCTD is an important component of assistive planning tools, and a challenge for commonsense reasoning systems. We probe how accurately SCTD can be done with the knowledge extracted from Large Language Models (LLMs). We introduce a high-quality human-annotated dataset for this problem and novel metrics to fairly assess performance of LLMs against several baselines. Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15% to 280% over the best baseline. We also propose a number of approaches to further improve their performance, with a relative improvement of 7% to 37% over the base model. However, we find that LLMs still struggle to predict pairwise temporal dependencies, which reveals a gap in their understanding of complex tasks.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use the knowledge of large - language models (LLMs) to effectively decompose complex tasks (Structured Complex Task Decomposition, SCTD). Specifically, the goal of SCTD is to decompose a complex real - world task (such as planning a wedding) into a directed acyclic graph (Directed Acyclic Graph, DAG), where nodes represent the various steps required to complete the task, and edges represent the temporal dependencies between these steps. The main contributions of the paper include: 1. Creating a high - quality human - annotated dataset named TaskLAMA, specifically for studying the understanding of complex real - world tasks. 2. Developing new evaluation metrics to fairly measure the performance of LLMs on SCTD tasks, avoiding the problem of arbitrarily increasing the metrics by simply adding duplicate sub - steps. 3. Proposing several LLM - based methods to improve the performance of SCTD tasks and comparing them with baseline methods that do not use LLMs. 4. Conducting a series of comprehensive experiments, showing that LLMs are excellent at decomposing complex tasks into a series of steps, but still have deficiencies in predicting pairwise temporal dependencies between steps. Through these efforts, the paper not only demonstrates the potential of LLMs in handling SCTD tasks, but also reveals their limitations in understanding the temporal dependencies of complex tasks. This provides directions for future research, especially in how to further improve the understanding ability of LLMs for complex tasks.

TaskLAMA: Probing the Complex Task Understanding of Language Models

TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents

Task Navigator: Decomposing Complex Tasks for Multimodal Large Language Models

TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage

MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model

Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

ADaPT: As-Needed Decomposition and Planning with Language Models

DELTA: Decomposed Efficient Long-Term Robot Task Planning using Large Language Models

DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models

On the Empirical Complexity of Reasoning and Planning in LLMs

Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

An empirical study on the effectiveness of large language models for SATD identification and classification

LLM-Augmented Symbolic Reinforcement Learning with Landmark-Based Task Decomposition

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models

Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Navigating Complexity: Orchestrated Problem Solving with Multi-Agent LLMs

Large Language Models as Commonsense Knowledge for Large-Scale Task Planning