Abstract:This study focuses on using large language models (LLMs) as a planner for embodied agents that can follow natural language instructions to complete complex tasks in a visually-perceived environment. The high data cost and poor sample efficiency of existing methods hinders the development of versatile agents that are capable of many tasks and can learn new tasks quickly. In this work, we propose a novel method, LLM-Planner, that harnesses the power of large language models to do few-shot planning for embodied agents. We further propose a simple but effective way to enhance LLMs with physical grounding to generate and update plans that are grounded in the current environment. Experiments on the ALFRED dataset show that our method can achieve very competitive few-shot performance: Despite using less than 0.5% of paired training data, LLM-Planner achieves competitive performance with recent baselines that are trained using the full training data. Existing methods can barely complete any task successfully under the same few-shot setting. Our work opens the door for developing versatile and sample-efficient embodied agents that can quickly learn many tasks. Website: <a class="link-external link-https" href="https://dki-lab.github.io/LLM-Planner" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve The paper "LLM-Planner: Few-Shot Environment-Aware Planning with Large Language Models" aims to address the following issues: 1. **High Data Cost and Low Sample Efficiency**: - Existing methods require a large amount of annotated data (i.e., pairs of natural language instructions and gold trajectories) to train multi-purpose agents capable of completing complex tasks, leading to high data costs and low sample efficiency. - This high cost and inefficiency limit the development of truly multifunctional agents. 2. **Lack of Dynamic Adaptability**: - Existing methods typically generate a static high-level plan (HLP) and then execute the entire plan. However, the same natural language instruction may require different plans in different environments. - The lack of ability to dynamically adjust plans based on environmental perception leads to agents getting stuck or failing when encountering unforeseen situations. 3. **Challenges in Partially Observable Environments**: - In partially observable environments, agents need to handle unknown objects and environmental changes. Existing methods assume that all feasible actions (i.e., [action, object] pairs) can be enumerated in advance, which is difficult to achieve in practical applications, especially in complex environments. ### Solution To address the above issues, the authors propose **LLM-Planner**, a planner based on large language models (LLM) with the following features: 1. **Few-Shot Learning**: - LLM-Planner can generate high-quality high-level plans using a small amount of paired training data (less than 0.5%), demonstrating extremely high data efficiency. 2. **Physical Environment Grounding**: - By injecting a list of observed objects in the environment into the prompt, LLM-Planner can generate plans closely related to the current environment, improving the feasibility of the plans. 3. **Dynamic Replanning**: - When the agent encounters difficulties while executing the current plan (e.g., unable to find the target object or action failure), LLM-Planner dynamically regenerates the plan based on new environmental perceptions, helping the agent to overcome obstacles. 4. **Hierarchical Planning Model**: - LLM-Planner adopts a hierarchical planning model, including a high-level planner and a low-level planner. The high-level planner generates high-level plans (HLP), and the low-level planner maps each sub-goal to a series of primitive actions to achieve the sub-goal in the current environment and state. ### Experimental Validation The authors conducted experiments on the ALFRED dataset, which includes diverse partially observable environments and various task types. The experimental results show that despite using less than 0.5% of the paired training data, the performance of LLM-Planner is comparable to baseline methods using the full training data, and even outperforms other baseline methods on certain metrics. This demonstrates the effectiveness and practicality of LLM-Planner in few-shot settings. ### Conclusion By proposing LLM-Planner, this paper addresses the shortcomings of existing methods in terms of data cost, sample efficiency, and dynamic adaptability, providing new insights for developing multifunctional and efficient agents.

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

Embodied Task Planning with Large Language Models

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

AdaPlanner: Adaptive Planning from Feedback with Language Models

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Improving Planning with Large Language Models: A Modular Agentic Architecture

Query-Efficient Planning with Language Models

On Grounded Planning for Embodied Tasks with Language Models

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Embodied AI in Mobile Robots: Coverage Path Planning with Large Language Models

Leave It to Large Language Models! Correction and Planning with Memory Integration

Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

DELTA: Decomposed Efficient Long-Term Robot Task Planning using Large Language Models

Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation

Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models

Inner Monologue: Embodied Reasoning through Planning with Language Models

LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning

LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner