Abstract:The need to analyze graphs is ubiquitous across various fields, from social networks to biological research and recommendation systems. Therefore, enabling the ability of large language models (LLMs) to process graphs is an important step toward more advanced general intelligence. However, current LLM benchmarks on graph analysis require models to directly reason over the prompts describing graph topology, and are thus limited to small graphs with only a few dozens of nodes. In contrast, human experts typically write programs based on popular libraries for task solving, and can thus handle graphs with different scales. To this end, a question naturally arises: can LLMs analyze graphs like professionals? In this paper, we introduce ProGraph, a manually crafted benchmark containing 3 categories of graph tasks. The benchmark expects solutions based on programming instead of directly reasoning over raw inputs. Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy. To bridge this gap, we propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries. By augmenting closed-source LLMs with document retrieval and fine-tuning open-source ones on the codes, we show 11-32% absolute improvements in their accuracies. Our results underscore that the capabilities of LLMs in handling structured data are still under-explored, and show the effectiveness of LLM4Graph in enhancing LLMs' proficiency of graph analysis. The benchmark, datasets and enhanced open-source models are available at <a class="link-external link-https" href="https://github.com/BUPT-GAMMA/ProGraph" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: currently, large - language models (LLMs) have limited capabilities in graph analysis tasks, especially when dealing with large - scale graph data. Existing benchmark tests require LLMs to reason directly from prompts that describe graph topologies, which enables them to only handle small - scale graphs containing dozens of nodes. In contrast, human experts usually write programs based on popular libraries to solve problems and can handle graphs of different scales. Specifically, the paper focuses on the following issues: 1. **Limitations of existing benchmark tests**: Existing benchmark tests require LLMs to reason directly from texts that describe graph structures, limiting them to only handle small - scale graphs (usually only dozens of nodes). This method cannot be effectively extended to large - scale graphs in practical applications. 2. **Limitations of reasoning depth**: Even with the help of Chain - of - Thought (CoT), the current LLMs still have limited reasoning depth and it is difficult to handle complex large - scale graph problems. 3. **Abstractness of problem descriptions**: The problem descriptions in existing benchmark tests are single - form and lack the background of real - world application scenarios. To solve these problems, the paper proposes the following research objectives: - Construct a new benchmark test named ProGraph to evaluate whether LLMs can analyze graph data by programming to call external APIs like professionals. - Provide the LLM4Graph dataset containing document and code data to enhance the performance of LLMs in graph analysis tasks. - Verify the effectiveness of these datasets through experiments and show how to improve the graph analysis capabilities of LLMs through techniques such as Retrieval - Augmented Generation (RAG) and instruction fine - tuning. In summary, this paper aims to explore and enhance the capabilities of LLMs in graph analysis tasks, especially the ability to handle large - scale graph data by programming to call external APIs.

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis

How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? an Empirical Evaluation and Benchmarking.

Large Language Models on Graphs: A Comprehensive Survey

GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets

Can Language Models Solve Graph Problems in Natural Language?

Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?

GraphArena: Benchmarking Large Language Models on Graph Computational Problems

A Survey of Large Language Models for Graphs

A Survey of Large Language Models on Generative Graph Analytics: Query, Learning, and Applications

Exploring the Potential of Large Language Models in Graph Generation

Are Large-Language Models Graph Algorithmic Reasoners?

A Survey of Graph Meets Large Language Model: Progress and Future Directions

Integrating Graphs With Large Language Models: Methods and Prospects

GraphLLM: Boosting Graph Reasoning Ability of Large Language Model

GLBench: A Comprehensive Benchmark for Graph with Large Language Models

GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability

GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration

Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs