LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

Chan Hee Song,Jiaman Wu,Clayton Washington,Brian M. Sadler,Wei-Lun Chao,Yu Su
2023-03-30
Abstract:This study focuses on using large language models (LLMs) as a planner for embodied agents that can follow natural language instructions to complete complex tasks in a visually-perceived environment. The high data cost and poor sample efficiency of existing methods hinders the development of versatile agents that are capable of many tasks and can learn new tasks quickly. In this work, we propose a novel method, LLM-Planner, that harnesses the power of large language models to do few-shot planning for embodied agents. We further propose a simple but effective way to enhance LLMs with physical grounding to generate and update plans that are grounded in the current environment. Experiments on the ALFRED dataset show that our method can achieve very competitive few-shot performance: Despite using less than 0.5% of paired training data, LLM-Planner achieves competitive performance with recent baselines that are trained using the full training data. Existing methods can barely complete any task successfully under the same few-shot setting. Our work opens the door for developing versatile and sample-efficient embodied agents that can quickly learn many tasks. Website: <a class="link-external link-https" href="https://dki-lab.github.io/LLM-Planner" rel="external noopener nofollow">this https URL</a>
Artificial Intelligence,Computation and Language,Computer Vision and Pattern Recognition,Machine Learning,Robotics
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper "LLM-Planner: Few-Shot Environment-Aware Planning with Large Language Models" aims to address the following issues: 1. **High Data Cost and Low Sample Efficiency**: - Existing methods require a large amount of annotated data (i.e., pairs of natural language instructions and gold trajectories) to train multi-purpose agents capable of completing complex tasks, leading to high data costs and low sample efficiency. - This high cost and inefficiency limit the development of truly multifunctional agents. 2. **Lack of Dynamic Adaptability**: - Existing methods typically generate a static high-level plan (HLP) and then execute the entire plan. However, the same natural language instruction may require different plans in different environments. - The lack of ability to dynamically adjust plans based on environmental perception leads to agents getting stuck or failing when encountering unforeseen situations. 3. **Challenges in Partially Observable Environments**: - In partially observable environments, agents need to handle unknown objects and environmental changes. Existing methods assume that all feasible actions (i.e., [action, object] pairs) can be enumerated in advance, which is difficult to achieve in practical applications, especially in complex environments. ### Solution To address the above issues, the authors propose **LLM-Planner**, a planner based on large language models (LLM) with the following features: 1. **Few-Shot Learning**: - LLM-Planner can generate high-quality high-level plans using a small amount of paired training data (less than 0.5%), demonstrating extremely high data efficiency. 2. **Physical Environment Grounding**: - By injecting a list of observed objects in the environment into the prompt, LLM-Planner can generate plans closely related to the current environment, improving the feasibility of the plans. 3. **Dynamic Replanning**: - When the agent encounters difficulties while executing the current plan (e.g., unable to find the target object or action failure), LLM-Planner dynamically regenerates the plan based on new environmental perceptions, helping the agent to overcome obstacles. 4. **Hierarchical Planning Model**: - LLM-Planner adopts a hierarchical planning model, including a high-level planner and a low-level planner. The high-level planner generates high-level plans (HLP), and the low-level planner maps each sub-goal to a series of primitive actions to achieve the sub-goal in the current environment and state. ### Experimental Validation The authors conducted experiments on the ALFRED dataset, which includes diverse partially observable environments and various task types. The experimental results show that despite using less than 0.5% of the paired training data, the performance of LLM-Planner is comparable to baseline methods using the full training data, and even outperforms other baseline methods on certain metrics. This demonstrates the effectiveness and practicality of LLM-Planner in few-shot settings. ### Conclusion By proposing LLM-Planner, this paper addresses the shortcomings of existing methods in terms of data cost, sample efficiency, and dynamic adaptability, providing new insights for developing multifunctional and efficient agents.