Abstract:Planning is a fundamental property of human intelligence. Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large language models (LLMs) succeed at this task? Here, we present the first large-scale study investigating this question. We find that a representative set of closed and open-source LLMs, including GPT-4 and LLaMA-2, behave poorly when not supplied with illustrations about the task-solving process in our benchmark AsyncHow. We propose a novel technique called Plan Like a Graph (PLaG) that combines graphs with natural language prompts and achieves state-of-the-art results. We show that although PLaG can boost model performance, LLMs still suffer from drastic degradation when task complexity increases, highlighting the limits of utilizing LLMs for simulating digital devices. We see our study as an exciting step towards using LLMs as efficient autonomous agents. Our code and data are available at <a class="link-external link-https" href="https://github.com/fangru-lin/graph-llm-asynchow-plan" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper explores the performance of large language models (LLMs) in asynchronous planning reasoning. Specifically, the researchers posed the following questions: 1. **Can LLMs effectively perform asynchronous planning reasoning?** - Asynchronous planning reasoning involves considering both sequential and parallel operations when handling tasks to optimize time costs. This is an important attribute of human intelligence but is very challenging to achieve in machines. 2. **Can LLMs complete asynchronous planning tasks without detailed solution diagrams?** - The study found that representative closed-source and open-source LLMs, including GPT-4 and LLaMA-2, perform poorly without task solution diagrams. 3. **How can new techniques improve LLMs' performance in asynchronous planning reasoning?** - To this end, the researchers proposed a new technique called "Plan Like a Graph" (PLaG), which combines graphical and natural language prompts, significantly enhancing the model's performance. ### Main Contributions 1. **Generated and open-sourced a high-quality asynchronous planning reasoning benchmark dataset, AsyncHow.** - This dataset contains 1.6K high-quality instances for evaluating LLMs' performance in real-life tasks. 2. **Demonstrated that LLMs cannot efficiently perform asynchronous planning tasks without detailed solution diagrams.** - Even high-performance models like GPT-4 perform poorly without diagrams. 3. **Defined a formal method to measure the complexity of natural asynchronous planning tasks, successfully predicting LLMs' performance trends.** - By defining task complexity as the longest path problem in a graph, the researchers could quantify task difficulty and predict model performance. 4. **Proposed the PLaG method, which significantly improves the performance of state-of-the-art models across all task complexities.** - The PLaG method not only improved model performance but also showed consistent improvement across tasks of varying complexity. 5. **Despite performance improvements, state-of-the-art LLMs still perform poorly on complex tasks, indicating limitations in using LLMs as digital devices.** - As task complexity increases, model performance drops sharply, highlighting significant limitations of LLMs on certain tasks. ### Conclusion This study not only provides new methods and tools for evaluating and improving LLMs' performance in asynchronous planning reasoning but also reveals the current limitations of LLMs in handling complex tasks. These findings are of great significance for advancing further research in the field of artificial intelligence.

Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

GraphLLM: Boosting Graph Reasoning Ability of Large Language Model

Large Language Models on Graphs: A Comprehensive Survey

Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis

Can Language Models Solve Graph Problems in Natural Language?

On the Planning Abilities of Large Language Models : A Critical Investigation

GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? an Empirical Evaluation and Benchmarking.

Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

Can Graph Learning Improve Planning in LLM-based Agents?

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs

On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)

GUNDAM: Aligning Large Language Models with Graph Understanding

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

Are Large-Language Models Graph Algorithmic Reasoners?

Graph Reasoning with Large Language Models via Pseudo-code Prompting

Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path