Open Grounded Planning: Challenges and Benchmark Construction

Shiguang Guo,Ziliang Deng,Hongyu Lin,Yaojie Lu,Xianpei Han,Le Sun

2024-06-05

Abstract:The emergence of large language models (LLMs) has increasingly drawn attention to the use of LLMs for human-like planning. Existing work on LLM-based planning either focuses on leveraging the inherent language generation capabilities of LLMs to produce free-style plans, or employs reinforcement learning approaches to learn decision-making for a limited set of actions within restricted environments. However, both approaches exhibit significant discrepancies from the open and executable requirements in real-world planning. In this paper, we propose a new planning task--open grounded planning. The primary objective of open grounded planning is to ask the model to generate an executable plan based on a variable action set, thereby ensuring the executability of the produced plan. To this end, we establishes a benchmark for open grounded planning spanning a wide range of domains. Then we test current state-of-the-art LLMs along with five planning approaches, revealing that existing LLMs and methods still struggle to address the challenges posed by grounded planning in open domains. The outcomes of this paper define and establish a foundational dataset for open grounded planning, and shed light on the potential challenges and future directions of LLM-based planning.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenge of achieving executable plan generation in the open domain. Specifically, when existing large - language models (LLMs) generate plans, although they can utilize their inherent language - generation capabilities to produce free - form plans or learn decision - making within a limited set of actions through reinforcement - learning methods, there are significant differences in these methods in the planning requirements of the real world. These differences are mainly reflected in openness and executability. Therefore, the paper proposes a new task - Open Grounded Planning, aiming to require the model to generate executable plans based on a variable set of actions, thereby ensuring that the generated plans are practically operable. The main contributions of the paper include: 1. Proposing the concept of Open Grounded Planning, with the goal of enabling future AI systems to plan tasks in the open domain and having the ability to ground plans to an open executable set of actions. 2. Constructing a benchmark that includes datasets from multiple domains, as well as an automatic evaluation process, for evaluating the performance of different models and methods in the Open Grounded Planning task. 3. Introducing a new framework - Retrieve and Rewrite - to address the challenges in the Open Grounded Planning task, and conducting comprehensive experiments on the current state - of - the - art models, finding that existing models and methods still face challenges in the Open Grounded Planning task.

Open Grounded Planning: Challenges and Benchmark Construction

On Grounded Planning for Embodied Tasks with Language Models

On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)

PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change

Exploring and Benchmarking the Planning Capabilities of Large Language Models

ET-Plan-Bench: Embodied Task-level Planning Benchmark Towards Spatial-Temporal Cognition with Foundation Models

NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

LASP: Surveying the State-of-the-Art in Large Language Model-Assisted AI Planning

ACPBench: Reasoning about Action, Change, and Planning

EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios

LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench

On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs

On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability

Understanding the planning of LLM agents: A survey

ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

On the Planning Abilities of Large Language Models : A Critical Investigation

Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning