GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets

Qiming Wu,Zichen Chen,Will Corcoran,Misha Sra,Ambuj K. Singh
2024-06-24
Abstract:Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs' ability to reason about graph-structured data. To address this gap, we introduce GraphEval2000, the first comprehensive graph dataset, comprising 40 graph data structure problems along with 2000 test cases. Additionally, we introduce an evaluation framework based on GraphEval2000, designed to assess the graph reasoning abilities of LLMs through coding challenges. Our dataset categorizes test cases into four primary and four sub-categories, ensuring a comprehensive evaluation. We evaluate eight popular LLMs on GraphEval2000, revealing that LLMs exhibit a better understanding of directed graphs compared to undirected ones. While private LLMs consistently outperform open-source models, the performance gap is narrowing. Furthermore, to improve the usability of our evaluation framework, we propose Structured Symbolic Decomposition (SSD), an instruction-based method designed to enhance LLM performance on GraphEval2000. Results show that SSD improves the performance of GPT-3.5, GPT-4, and GPT-4o on complex graph problems, with an increase of 11.11\%, 33.37\%, and 33.37\%, respectively.
Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the insufficient reasoning ability of large language models (LLMs) when processing graph - structured data. Although LLMs perform excellently in natural language processing (NLP) tasks, they have significant limitations in processing complex graph - structured data and multi - step reasoning processes. Specifically: 1. **Limitations of Existing Research**: Current research shows that although LLMs can handle basic graph - related queries, they perform poorly when faced with more complex graph structures and multi - step reasoning tasks. 2. **Lack of Evaluation Benchmarks**: Previously, there was no comprehensive benchmark test set to systematically evaluate the reasoning ability of LLMs on graph - structured data. To solve these problems, the paper introduced **GraphEval2000**, a data set containing 40 graph data structure problems and 2,000 test cases. Through this data set, researchers can evaluate the performance of LLMs in graph reasoning tasks and reveal their performance differences on different types of graphs (such as sparse graphs, planar graphs, regular graphs, and complete graphs). In addition, the paper also proposed an instruction - based method - **Structured Symbolic Decomposition (SSD)** - which aims to enhance the graph reasoning ability of LLMs by decomposing complex tasks into smaller symbolic subtasks. Experimental results show that the SSD method significantly improves the performance of models such as GPT - 3.5, GPT - 4, and GPT - 4o on complex graph problems. ### Main Contributions: 1. **Constructing the GraphEval2000 Data Set**: This is the first data set specifically designed to evaluate the graph reasoning ability of LLMs, containing 40 data structure problems and 2,000 test cases. 2. **Proposing an Evaluation Framework**: Based on GraphEval2000, an evaluation framework with real - time feedback is provided to help users iteratively improve model performance. 3. **Establishing Benchmarks**: Benchmark tests were carried out on eight popular LLMs, revealing their performance differences on different types of graph structures. 4. **Proposing the SSD Method**: By decomposing complex tasks into cognitive steps and action steps, the reasoning ability of LLMs on complex graph problems is significantly improved. In summary, this paper aims to fill the gap in the graph reasoning field of LLMs and provide tools and methods to improve the performance of these models when processing graph - structured data.