Similarity-based Neighbor Selection for Graph LLMs

Rui Li,Jiwei Li,Jiawei Han,Guoyin Wang
2024-02-06
Abstract:Text-attributed graphs (TAGs) present unique challenges for direct processing by Language Learning Models (LLMs), yet their extensive commonsense knowledge and robust reasoning capabilities offer great promise for node classification in TAGs. Prior research in this field has grappled with issues such as over-squashing, heterophily, and ineffective graph information integration, further compounded by inconsistencies in dataset partitioning and underutilization of advanced LLMs. To address these challenges, we introduce Similarity-based Neighbor Selection (SNS). Using SimCSE and advanced neighbor selection techniques, SNS effectively improves the quality of selected neighbors, thereby improving graph representation and alleviating issues like over-squashing and heterophily. Besides, as an inductive and training-free approach, SNS demonstrates superior generalization and scalability over traditional GNN methods. Our comprehensive experiments, adhering to standard dataset partitioning practices, demonstrate that SNS, through simple prompt interactions with LLMs, consistently outperforms vanilla GNNs and achieves state-of-the-art results on datasets like PubMed in node classification, showcasing LLMs' potential in graph structure understanding. Our research further underscores the significance of graph structure integration in LLM applications and identifies key factors for their success in node classification. Code is available at
Machine Learning,Artificial Intelligence,Computation and Language,Social and Information Networks
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to effectively use large - language models (LLMs) for node classification in text - attributed graphs (TAGs). Although LLMs perform excellently in fields such as language understanding, multi - step reasoning, vision, and robotics, they still face many challenges when dealing with node classification tasks in TAGs, such as over - squashing, heterophily, and insufficient graph information integration. These problems limit the performance of LLMs in node classification tasks, making it difficult for them to outperform traditional supervised graph learning models, such as graph convolutional networks (GCNs) and graph attention networks (GATs). To solve the above problems, the paper proposes the Similarity - based Neighbor Selection (SNS) method. SNS improves the quality of neighbor selection through recursive neighbor selection and similarity - based neighbor ranking strategies, thereby improving the graph representation and alleviating problems such as over - squashing and heterophily. Specifically, SNS uses SimCSE to measure and rank the text - attribute similarity between nodes and their neighbors, and integrates the information of the top - ranked neighbors into the prompts of LLMs. This method not only improves the quality of graph information integration but also demonstrates the potential of LLMs in processing graph - structured data. The paper verifies the effectiveness of SNS through experiments on five widely - used node classification benchmark datasets. The experimental results show that SNS significantly outperforms traditional GNN methods in the zero - shot scenario and achieves state - of - the - art performance on the PubMed dataset. In addition, SNS also demonstrates better generalization ability and scalability and can effectively handle large - scale graph data.