ComGPT: Detecting Local Community Structure with Large Language Models

Li Ni,Haowen Shen,Lin Mu,Yiwen Zhang,Wenjian Luo
2024-09-13
Abstract:Large Language Models (LLMs), like GPT, have demonstrated the ability to understand graph structures and have achieved excellent performance in various graph reasoning tasks, such as node classification. Despite their strong abilities in graph reasoning tasks, they lack specific domain knowledge and have a weaker understanding of community-related graph information, which hinders their capabilities in the community detection task. Moreover, local community detection algorithms based on seed expansion, referred to as seed expansion algorithms, often face the seed-dependent problem, community diffusion, and free rider effect. To use LLMs to overcome the above shortcomings, we explore a GPT-guided seed expansion algorithm named ComGPT. ComGPT iteratively selects potential nodes by local modularity M from the detected community's neighbors, and subsequently employs LLMs to choose the node to join the detected community from these selected potential nodes. To address the above issues faced by LLMs, we improve graph encoding method, called Incident, by incorporating community knowledge to improve LLMs's understanding of community-related graph information. Additionally, we design the NSG (Node Selection Guide) prompt to enhance LLMs' understanding of community characteristics. Experimental results demonstrate that ComGPT outperforms the comparison methods, thereby confirming the effectiveness of the improved graph encoding method and prompts.
Social and Information Networks
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the deficiencies of existing local community detection algorithms in terms of seed - dependence, community diffusion and free - rider effect. Specifically: 1. **Seed - dependent problem**: The quality of communities detected by traditional seed - expansion - based algorithms is highly dependent on the selection of initial seed nodes, which may lead to inaccurate or unstable detection results. 2. **Community diffusion**: The detected communities may contain nodes from different communities, resulting in blurred community boundaries and affecting the detection accuracy. 3. **Free - rider effect**: In order to increase the value of the scoring function, nodes irrelevant to the target community may be wrongly added to the community, thus reducing the purity of the community. To solve the above problems, the author proposes a seed - expansion algorithm guided by large - language models (LLMs) - ComGPT. ComGPT improves the existing local community detection methods in the following ways: - **Using LLMs to select nodes**: ComGPT uses GPT - 3.5 - turbo to select nodes to be added to the community instead of relying on traditional scoring functions. This helps to alleviate the seed - dependence problem, community diffusion and free - rider effect. - **Improving the graph - encoding method**: The author designs an improved graph - encoding method, called Incident, which enhances LLMs' understanding of community - related graph information by introducing community information. - **Designing specific prompts**: The author designs NSG (Node Selection Guide) prompts to help LLMs better understand community characteristics and guide them to select appropriate nodes to be added to the community. In summary, this paper aims to improve the local community detection algorithm by combining the powerful natural - language - processing capabilities of LLMs, in order to overcome the limitations in existing methods and thus detect the community structure in the network more accurately and stably.