Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

Fangru Lin,Emanuele La Malfa,Valentin Hofmann,Elle Michelle Yang,Anthony Cohn,Janet B. Pierrehumbert
2024-06-03
Abstract:Planning is a fundamental property of human intelligence. Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large language models (LLMs) succeed at this task? Here, we present the first large-scale study investigating this question. We find that a representative set of closed and open-source LLMs, including GPT-4 and LLaMA-2, behave poorly when not supplied with illustrations about the task-solving process in our benchmark AsyncHow. We propose a novel technique called Plan Like a Graph (PLaG) that combines graphs with natural language prompts and achieves state-of-the-art results. We show that although PLaG can boost model performance, LLMs still suffer from drastic degradation when task complexity increases, highlighting the limits of utilizing LLMs for simulating digital devices. We see our study as an exciting step towards using LLMs as efficient autonomous agents. Our code and data are available at <a class="link-external link-https" href="https://github.com/fangru-lin/graph-llm-asynchow-plan" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper explores the performance of large language models (LLMs) in asynchronous planning reasoning. Specifically, the researchers posed the following questions: 1. **Can LLMs effectively perform asynchronous planning reasoning?** - Asynchronous planning reasoning involves considering both sequential and parallel operations when handling tasks to optimize time costs. This is an important attribute of human intelligence but is very challenging to achieve in machines. 2. **Can LLMs complete asynchronous planning tasks without detailed solution diagrams?** - The study found that representative closed-source and open-source LLMs, including GPT-4 and LLaMA-2, perform poorly without task solution diagrams. 3. **How can new techniques improve LLMs' performance in asynchronous planning reasoning?** - To this end, the researchers proposed a new technique called "Plan Like a Graph" (PLaG), which combines graphical and natural language prompts, significantly enhancing the model's performance. ### Main Contributions 1. **Generated and open-sourced a high-quality asynchronous planning reasoning benchmark dataset, AsyncHow.** - This dataset contains 1.6K high-quality instances for evaluating LLMs' performance in real-life tasks. 2. **Demonstrated that LLMs cannot efficiently perform asynchronous planning tasks without detailed solution diagrams.** - Even high-performance models like GPT-4 perform poorly without diagrams. 3. **Defined a formal method to measure the complexity of natural asynchronous planning tasks, successfully predicting LLMs' performance trends.** - By defining task complexity as the longest path problem in a graph, the researchers could quantify task difficulty and predict model performance. 4. **Proposed the PLaG method, which significantly improves the performance of state-of-the-art models across all task complexities.** - The PLaG method not only improved model performance but also showed consistent improvement across tasks of varying complexity. 5. **Despite performance improvements, state-of-the-art LLMs still perform poorly on complex tasks, indicating limitations in using LLMs as digital devices.** - As task complexity increases, model performance drops sharply, highlighting significant limitations of LLMs on certain tasks. ### Conclusion This study not only provides new methods and tools for evaluating and improving LLMs' performance in asynchronous planning reasoning but also reveals the current limitations of LLMs in handling complex tasks. These findings are of great significance for advancing further research in the field of artificial intelligence.