Abstract:We investigate the challenge of task planning for multi-task embodied agents in open-world environments. Two main difficulties are identified: 1) executing plans in an open-world environment (e.g., Minecraft) necessitates accurate and multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient or even infeasible. To this end, we propose "$\underline{D}$escribe, $\underline{E}$xplain, $\underline{P}$lan and $\underline{S}$elect" ($\textbf{DEPS}$), an interactive planning approach based on Large Language Models (LLMs). DEPS facilitates better error correction on initial LLM-generated $\textit{plan}$ by integrating $\textit{description}$ of the plan execution process and providing self-$\textit{explanation}$ of feedback when encountering failures during the extended planning phases. Furthermore, it includes a goal $\textit{selector}$, which is a trainable module that ranks parallel candidate sub-goals based on the estimated steps of completion, consequently refining the initial plan. Our experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks and nearly double the overall performances. Further testing reveals our method's general effectiveness in popularly adopted non-open-ended domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and exploratory studies detail how our design beats the counterparts and provide a promising update on the $\texttt{ObtainDiamond}$ grand challenge with our approach. The code is released at <a class="link-external link-https" href="https://github.com/CraftJarvis/MC-Planner" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to solve the task - planning problem of multi - task agents in an open - world environment. Specifically, the paper identifies two main challenges: 1. **Executing plans in an open - world environment**: Compared with traditional environments, tasks in open - world environments are long - term in nature and require accurate and multi - step reasoning to complete the tasks. 2. **Feasibility of subtasks**: Existing planners do not consider the difficulty level for the current agent to complete a given subtask when ordering parallel sub - goals within a complex plan, resulting in generated plans that may be inefficient or even infeasible. To address these challenges, the authors propose the "Describe, Explain, Plan, and Select" (DEPS) method, which is an interactive planning method based on large - language models (LLMs). DEPS improves the plan initially generated by the LLM in the following ways: - **Describe**: When the controller fails to complete a certain sub - goal, the describer will summarize the current situation into text and send it back to the LLM planner. - **Explain**: The LLM, as an explainer, locates the errors in the previous plan. - **Plan**: The planner uses the information from the describer and the explainer to refine the plan. - **Select**: The selector is a trainable module that ranks parallel candidate sub - goals according to the estimated number of steps required to complete each candidate sub - goal, thereby optimizing the initial plan. Through these mechanisms, DEPS can generate and adjust plans more effectively in an open - world environment and improve the success rate of multi - task agents. Experimental results show that DEPS can robustly complete more than 70 tasks in Minecraft, and the overall performance is almost doubled. In addition, DEPS also shows good performance in other non - open - world tasks (such as ALFWorld and desktop operations).

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents.

Improving Planning with Large Language Models: A Modular Agentic Architecture

APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents

TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models

LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner

ReasonPlanner: Enhancing Autonomous Planning in Dynamic Environments with Temporal Knowledge Graphs and LLMs

Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks

Language-Augmented Symbolic Planner for Open-World Task Planning

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

AdaPlanner: Adaptive Planning from Feedback with Language Models

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

Planning with Multi-Constraints via Collaborative Language Agents

Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

Query-Efficient Planning with Language Models

Leveraging LLMs, Graphs and Object Hierarchies for Task Planning in Large-Scale Environments

Sequential Planning in Large Partially Observable Environments guided by LLMs

Embodied Task Planning with Large Language Models

MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model