AutoGPT+P: Affordance-based Task Planning with Large Language Models

Timo Birr,Christoph Pohl,Abdelrahman Younes,Tamim Asfour

2024-02-17

Abstract:Recent advances in task planning leverage Large Language Models (LLMs) to improve generalizability by combining such models with classical planning algorithms to address their inherent limitations in reasoning capabilities. However, these approaches face the challenge of dynamically capturing the initial state of the task planning problem. To alleviate this issue, we propose AutoGPT+P, a system that combines an affordance-based scene representation with a planning system. Affordances encompass the action possibilities of an agent on the environment and objects present in it. Thus, deriving the planning domain from an affordance-based scene representation allows symbolic planning with arbitrary objects. AutoGPT+P leverages this representation to derive and execute a plan for a task specified by the user in natural language. In addition to solving planning tasks under a closed-world assumption, AutoGPT+P can also handle planning with incomplete information, e. g., tasks with missing objects by exploring the scene, suggesting alternatives, or providing a partial plan. The affordance-based scene representation combines object detection with an automatically generated object-affordance-mapping using ChatGPT. The core planning tool extends existing work by automatically correcting semantic and syntactic errors. Our approach achieves a success rate of 98%, surpassing the current 81% success rate of the current state-of-the-art LLM-based planning method SayCan on the SayCan instruction set. Furthermore, we evaluated our approach on our newly created dataset with 150 scenarios covering a wide range of complex tasks with missing objects, achieving a success rate of 79% on our dataset. The dataset and the code are publicly available at

Robotics,Artificial Intelligence

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve The paper aims to address the issue of converting natural language instructions directly into task plans for robots, particularly how to dynamically adapt to environmental changes in the absence of necessary objects. Specifically, the paper proposes a system called **AutoGPT+P**, which combines scenario representation based on availability with a planning system. Its main objectives are: 1. **Dynamically Capture Initial State**: Address the challenges of current methods in dynamically capturing the initial state for task planning. 2. **Handle Incomplete Information**: Complete tasks by exploring the environment, proposing alternatives, or formulating partial plans even when necessary objects are missing. 3. **Enhance Planning Capability**: Utilize object detection and automatically generated Object-Availability Mapping (OAM) to generate scene representations required for symbolic planning using ChatGPT. Additionally, the system can solve planning tasks under the closed-world assumption and with incomplete information, and it can improve success rates by automatically correcting semantic and syntactic errors. Experimental results show that AutoGPT+P achieves a 98% success rate on the SayCan instruction set, surpassing the current state-of-the-art success rate of 81%; it also achieves a 79% success rate on a new dataset containing 150 complex tasks.

AutoGPT+P: Affordance-based Task Planning with Large Language Models

Embodied Task Planning with Large Language Models

Understanding the Capabilities of Large Language Models for Automated Planning

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning

Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning

Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios

Exploring and Benchmarking the Planning Capabilities of Large Language Models

Improving Planning with Large Language Models: A Modular Agentic Architecture

SayCanPay: Heuristic Planning with Large Language Models using Learnable Domain Knowledge

On the Planning, Search, and Memorization Capabilities of Large Language Models

LASP: Surveying the State-of-the-Art in Large Language Model-Assisted AI Planning

RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks

Learning adaptive planning representations with natural language guidance

On the Planning Abilities of Large Language Models : A Critical Investigation

Generalized Planning in PDDL Domains with Pretrained Large Language Models

LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner

A framework for neurosymbolic robot action planning using large language models