Abstract:Large language models (LLMs) are increasingly adopted for a variety of tasks with implicit graphical structures, such as planning in robotics, multi-hop question answering or knowledge probing, structured commonsense reasoning, and more. While LLMs have advanced the state-of-the-art on these tasks with structure implications, whether LLMs could explicitly process textual descriptions of graphs and structures, map them to grounded conceptual spaces, and perform structured operations remains underexplored. To this end, we propose NLGraph (Natural Language Graph), a comprehensive benchmark of graph-based problem solving designed in natural language. NLGraph contains 29,370 problems, covering eight graph reasoning tasks with varying complexity from simple tasks such as connectivity and shortest path up to complex problems such as maximum flow and simulating graph neural networks. We evaluate LLMs (GPT-3/4) with various prompting approaches on the NLGraph benchmark and find that 1) language models do demonstrate preliminary graph reasoning abilities, 2) the benefit of advanced prompting and in-context learning diminishes on more complex graph problems, while 3) LLMs are also (un)surprisingly brittle in the face of spurious correlations in graph and problem settings. We then propose Build-a-Graph Prompting and Algorithmic Prompting, two instruction-based approaches to enhance LLMs in solving natural language graph problems. Build-a-Graph and Algorithmic prompting improve the performance of LLMs on NLGraph by 3.07% to 16.85% across multiple tasks and settings, while how to solve the most complicated graph reasoning tasks in our setup with language models remains an open research question. The NLGraph benchmark and evaluation code are available at <a class="link-external link-https" href="https://github.com/Arthur-Heng/NLGraph" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is whether large - language models (LLMs) can handle graph - structure problems, that is, whether LLMs can map graphs and structures described in natural language to specific concept spaces and perform structured operations to solve graph - algorithm problems. Specifically, the researchers hope to understand: 1. **Initial graph - reasoning ability of LLMs**: How do LLMs perform on simple graph - reasoning tasks, such as connectivity, cycle detection, and shortest - path tasks? 2. **Effect of advanced prompting methods**: How effective are different prompting methods (such as chain - of - thought prompting, least - to - most prompting, self - consistency prompting, etc.) in graph - reasoning tasks of different complexities? 3. **Effectiveness of in - context learning**: Can few - shot in - context learning improve the performance of LLMs in complex graph - reasoning tasks? 4. **Sensitivity to spurious correlations**: When facing specific problem settings, will LLMs rely on certain spurious correlations, thus affecting their reasoning ability? To explore these issues, the researchers constructed a benchmark test set named NLGraph, which contains 29,370 questions and covers eight graph - reasoning tasks from simple connectivity detection to complex maximum - flow and simulated graph neural networks. Through this benchmark test set, the researchers evaluated the performance of multiple LLMs under different prompting methods and drew conclusions in the above - mentioned aspects.

Can Language Models Solve Graph Problems in Natural Language?

Can LLMs perform structured graph reasoning?

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

Graph Reasoning with Large Language Models via Pseudo-code Prompting

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? an Empirical Evaluation and Benchmarking.

Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis

Are Large-Language Models Graph Algorithmic Reasoners?

GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets

Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path

Large Language Models on Graphs: A Comprehensive Survey

GraphLLM: Boosting Graph Reasoning Ability of Large Language Model

Structure Guided Prompt: Instructing Large Language Model in Multi-Step Reasoning by Exploring Graph Structure of the Text

Graph Neural Prompting with Large Language Models

GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability

Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?

Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

GraphText: Graph Reasoning in Text Space

Can LLM Graph Reasoning Generalize beyond Pattern Memorization?

Can Graph Descriptive Order Affect Solving Graph Problems with LLMs?

Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT