Abstract:Robotic planning and execution in open-world environments is a complex problem due to the vast state spaces and high variability of task embodiment. Recent advances in perception algorithms, combined with Large Language Models (LLMs) for planning, offer promising solutions to these challenges, as the common sense reasoning capabilities of LLMs provide a strong heuristic for efficiently searching the action space. However, prior work fails to address the possibility of hallucinations from LLMs, which results in failures to execute the planned actions largely due to logical fallacies at high- or low-levels. To contend with automation failure due to such hallucinations, we introduce ConceptAgent, a natural language-driven robotic platform designed for task execution in unstructured environments. With a focus on scalability and reliability of LLM-based planning in complex state and action spaces, we present innovations designed to limit these shortcomings, including 1) Predicate Grounding to prevent and recover from infeasible actions, and 2) an embodied version of LLM-guided Monte Carlo Tree Search with self reflection. In simulation experiments, ConceptAgent achieved a 19% task completion rate across three room layouts and 30 easy level embodied tasks outperforming other state-of-the-art LLM-driven reasoning baselines that scored 10.26% and 8.11% on the same benchmark. Additionally, ablation studies on moderate to hard embodied tasks revealed a 20% increase in task completion from the baseline agent to the fully enhanced ConceptAgent, highlighting the individual and combined contributions of Predicate Grounding and LLM-guided Tree Search to enable more robust automation in complex state and action spaces.

What problem does this paper attempt to address?

This paper attempts to solve the complex problem of achieving robot task planning and execution in an open - world environment. Specifically, the paper focuses on how to enable robots to understand natural - language instructions and complete tasks in unfamiliar, unstructured environments. These tasks may involve identifying and manipulating new objects, as well as adapting to unknown obstacles and unexpected changes in the environment. To address these challenges, the paper proposes ConceptAgent, a natural - language - driven robot platform, aiming to improve the reliability and efficiency of task execution through the following innovations: 1. **Precondition Grounding**: ConceptAgent integrates a precondition - verification mechanism, which formally verifies action constraints before executing actions, prevents infeasible actions, and promotes failure recovery. This ensures that the agent can maintain task progress even in an unstructured environment. 2. **LLM - Guided Monte Carlo Tree Search (LLM - MCTS)**: ConceptAgent adopts large - language - model - guided tree search and self - reflection mechanisms, enabling the agent to explore future states and dynamically optimize action sequences. This method significantly improves planning efficiency and task completion rates, even in large, open - world state spaces. Through these innovations, ConceptAgent can perform well in both simulated and real - world environments, especially when dealing with complex, long - cycle embodied tasks. It can more effectively generate a series of discrete actions and reduce the accumulation and spread of errors. In addition, the paper also verifies the effectiveness of ConceptAgent through a series of experiments, including comparison experiments with existing state - of - the - art large - language - model - driven reasoning baseline methods, and physical mobile manipulation tests under different levels of environmental clutter. The experimental results show that ConceptAgent significantly outperforms other methods in terms of task completion rate, especially when precondition - verification and large - language - model - guided tree search are combined.

ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution

ReasonPlanner: Enhancing Autonomous Planning in Dynamic Environments with Temporal Knowledge Graphs and LLMs

Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models

Sequential Planning in Large Partially Observable Environments guided by LLMs

Leveraging LLMs, Graphs and Object Hierarchies for Task Planning in Large-Scale Environments

PLATO: Planning with LLMs and Affordances for Tool Manipulation

Inner Monologue: Embodied Reasoning through Planning with Language Models

KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents

LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots

Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents.

Improving Planning with Large Language Models: A Modular Agentic Architecture

Learning adaptive planning representations with natural language guidance

Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents