Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

Xin Li,Weize Chen,Qizhi Chu,Haopeng Li,Zhaojun Sun,Ran Li,Chen Qian,Yiwei Wei,Zhiyuan Liu,Chuan Shi,Maosong Sun,Cheng Yang
2024-10-19
Abstract:The need to analyze graphs is ubiquitous across various fields, from social networks to biological research and recommendation systems. Therefore, enabling the ability of large language models (LLMs) to process graphs is an important step toward more advanced general intelligence. However, current LLM benchmarks on graph analysis require models to directly reason over the prompts describing graph topology, and are thus limited to small graphs with only a few dozens of nodes. In contrast, human experts typically write programs based on popular libraries for task solving, and can thus handle graphs with different scales. To this end, a question naturally arises: can LLMs analyze graphs like professionals? In this paper, we introduce ProGraph, a manually crafted benchmark containing 3 categories of graph tasks. The benchmark expects solutions based on programming instead of directly reasoning over raw inputs. Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy. To bridge this gap, we propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries. By augmenting closed-source LLMs with document retrieval and fine-tuning open-source ones on the codes, we show 11-32% absolute improvements in their accuracies. Our results underscore that the capabilities of LLMs in handling structured data are still under-explored, and show the effectiveness of LLM4Graph in enhancing LLMs' proficiency of graph analysis. The benchmark, datasets and enhanced open-source models are available at <a class="link-external link-https" href="https://github.com/BUPT-GAMMA/ProGraph" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: currently, large - language models (LLMs) have limited capabilities in graph analysis tasks, especially when dealing with large - scale graph data. Existing benchmark tests require LLMs to reason directly from prompts that describe graph topologies, which enables them to only handle small - scale graphs containing dozens of nodes. In contrast, human experts usually write programs based on popular libraries to solve problems and can handle graphs of different scales. Specifically, the paper focuses on the following issues: 1. **Limitations of existing benchmark tests**: Existing benchmark tests require LLMs to reason directly from texts that describe graph structures, limiting them to only handle small - scale graphs (usually only dozens of nodes). This method cannot be effectively extended to large - scale graphs in practical applications. 2. **Limitations of reasoning depth**: Even with the help of Chain - of - Thought (CoT), the current LLMs still have limited reasoning depth and it is difficult to handle complex large - scale graph problems. 3. **Abstractness of problem descriptions**: The problem descriptions in existing benchmark tests are single - form and lack the background of real - world application scenarios. To solve these problems, the paper proposes the following research objectives: - Construct a new benchmark test named ProGraph to evaluate whether LLMs can analyze graph data by programming to call external APIs like professionals. - Provide the LLM4Graph dataset containing document and code data to enhance the performance of LLMs in graph analysis tasks. - Verify the effectiveness of these datasets through experiments and show how to improve the graph analysis capabilities of LLMs through techniques such as Retrieval - Augmented Generation (RAG) and instruction fine - tuning. In summary, this paper aims to explore and enhance the capabilities of LLMs in graph analysis tasks, especially the ability to handle large - scale graph data by programming to call external APIs.