ProCom: A Few-shot Targeted Community Detection Algorithm

Xixi Wu,Kaiyu Xiong,Yun Xiong,Xiaoxin He,Yao Zhang,Yizhu Jiao,Jiawei Zhang
2024-08-14
Abstract:Targeted community detection aims to distinguish a particular type of community in the network. This is an important task with a lot of real-world applications, e.g., identifying fraud groups in transaction networks. Traditional community detection methods fail to capture the specific features of the targeted community and detect all types of communities indiscriminately. Semi-supervised community detection algorithms, emerged as a feasible alternative, are inherently constrained by their limited adaptability and substantial reliance on a large amount of labeled data, which demands extensive domain knowledge and manual effort. In this paper, we address the aforementioned weaknesses in targeted community detection by focusing on few-shot scenarios. We propose ProCom, a novel framework that extends the ``pre-train, prompt'' paradigm, offering a low-resource, high-efficiency, and transferable solution. Within the framework, we devise a dual-level context-aware pre-training method that fosters a deep understanding of latent communities in the network, establishing a rich knowledge foundation for downstream task. In the prompt learning stage, we reformulate the targeted community detection task into pre-training objectives, allowing the extraction of specific knowledge relevant to the targeted community to facilitate effective and efficient inference. By leveraging both the general community knowledge acquired during pre-training and the specific insights gained from the prompt communities, ProCom exhibits remarkable adaptability across different datasets. We conduct extensive experiments on five benchmarks to evaluate the ProCom framework, demonstrating its SOTA performance under few-shot scenarios, strong efficiency, and transferability across diverse datasets.
Social and Information Networks
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to effectively conduct targeted community detection in low - resource conditions?** Specifically, traditional community detection methods and semi - supervised community detection algorithms have limitations when faced with the identification of specific types of communities. Traditional methods will extract all types of communities indiscriminately, while semi - supervised methods rely on a large amount of labeled data and have limited adaptability. Therefore, this paper aims to overcome these limitations and propose an efficient and transferable targeted community detection framework suitable for a small number of labeled samples. ### Problem Background 1. **Limitations of Traditional Community Detection Methods**: - Traditional community detection methods cannot focus on specific types of communities but extract all communities in the network indiscriminately. - When dealing with complex networks, these methods may identify many irrelevant communities, deviating from the purpose of targeted community detection. 2. **Limitations of Semi - supervised Community Detection Methods**: - Semi - supervised methods require a large amount of labeled data (usually 100 - 500 labeled instances), which is very time - consuming in practical applications and requires a large amount of domain knowledge. - These methods have limited adaptability. Each time a new targeted community detection task is encountered, relevant unlabeled data needs to be recollected and retrained. ### Solution To solve the above problems, the paper proposes a new framework named **ProCom**, which is based on the "pre - train, prompt paradigm" and aims to provide a low - resource, high - efficiency, and transferable solution. Specifically: 1. **Pre - training Stage**: - A dual - level context - aware pre - training method is proposed, enabling the model to deeply understand the potential community structure in the network and establish a rich knowledge base. - Pre - training objectives include node - to - context proximity and context distinction to capture the internal structure of potential communities and their unique characteristics. 2. **Prompt Learning Stage**: - In the prompt learning stage, by introducing a small number of few - shot samples, the targeted community detection task is re - formulated as a pre - training objective, thereby extracting specific knowledge related to the targeted community. - Candidate communities are generated through proximity analysis, and similarity matching is carried out, and finally, new communities similar to the prompt community are predicted. ### Main Contributions 1. **Extension of the "Pre - train + Prompt" Paradigm**: For the first time, prompt learning is applied to the community detection task, solving the problem of relying on a large amount of labeled data. 2. **Dual - level Context - aware Pre - training Method**: Enables the model to comprehensively understand the potential community structure in the network. 3. **Targeted Community - guided Prompt Mechanism**: By aligning downstream tasks with pre - training objectives, knowledge specific to the targeted community is extracted to achieve efficient inference. 4. **Experimental Verification**: Extensive experiments have been carried out on multiple real - world datasets, proving the superior performance, robustness, and transferability of ProCom in the few - shot condition. Through these improvements, ProCom can efficiently identify specific targeted communities in low - resource conditions and has broad application prospects.