Abstract:The rise of online social networks has fundamentally transformed the traditional way of social interaction and information dissemination, leading to a growing interest in precise community detection and in-depth network structure analysis. However, the complexity of network structures and potential issues like singularity and subjectivity in information extraction affect the accuracy of community detection. To overcome these challenges, we propose a new community detection algorithm, known as the Hierarchical Louvain (H-Louvain) algorithm. It enhances the performance of community detection through a multi-level processing and information fusion strategy. Specifically, the algorithm integrates graph compression techniques with the Hyperlink-Induced Topic Search (HITS) algorithm for initial network hierarchical partitioning, simultaneously filtering out low-quality posts and users while retaining critical information. Furthermore, the proposed method enhances semantic representation by automatically determining an appropriate number of attribute vector dimensions and obtaining attribute weight information through the calculation of self-authority values and the "minimum distance" attribute of posts. Lastly, the method creates an initial user training set through network re-partitioning in hierarchical layers and improves the Louvain algorithm for community partitioning by estimating the comprehensive influence of nodes. Extensive experimentation has demonstrated that the H-Louvain algorithm outperforms state-of-the-art comparative algorithms in terms of accuracy and stability in community detection based on real-world Twitter datasets.
What problem does this paper attempt to address?
The problems that this paper attempts to solve are as follows: In large - scale social network data, the existing community detection algorithms have insufficient accuracy and efficiency when dealing with complex network structures, low - quality posts and zombie users. Specifically:
1. **Interference from low - quality posts and zombie users**: There are a large number of low - quality posts (such as spam, false information) and zombie users (such as fake accounts, inactive accounts) in online social networks. These irrelevant data significantly increase the computational cost of the algorithm and may reduce the accuracy of community detection.
2. **The problem of manually defining semantic features**: Most of the existing community detection algorithms rely on manually defined semantic features when combining attribute information, which may lead to the neglect of important semantic information or the introduction of subjectivity and bias.
3. **Ignoring the influence of users' global activities**: Existing algorithms usually only consider the local influence of users and ignore the global activities and connection patterns of users in the entire network, which may have an important impact on the formation and evolution of communities.
To solve these problems, the author proposes a new community detection algorithm named H - Louvain. This algorithm aims to improve the performance of community detection through multi - level processing and information fusion strategies. Specific improvements include:
- **Combination of graph compression and HITS algorithm**: Through graph compression technology and HITS algorithm, the initial network hierarchical division is carried out, reducing network complexity and retaining key information, while filtering out low - quality posts and zombie users.
- **Automatically determining the dimension of attribute vectors**: Use the LDA model to automatically determine the dimension of attribute vectors, and obtain attribute weight information by calculating the self - authority value of posts and the "minimum distance" attribute, enhancing the semantic representation of the community.
- **Introducing the concept of user activities**: Define and measure the activity level and information dissemination ability of users in the network, construct an initial user training set, in order to better understand user behavior and status, and provide additional context information for community detection.
Through these improvements, the experimental results of the H - Louvain algorithm on the real - world Twitter dataset show that it is superior to other state - of - the - art community detection algorithms in terms of accuracy and stability.
### Formula Summary
1. **The formula for updating edge weights after graph compression**:
\[
WS(v_j, v_k) = W(v_j, v_k) + \frac{1}{2} \cdot W(v_i, v_j) \cdot W(v_i, v_k)
\]
2. **Iterative calculation of post authority values and user centrality**:
\[
d.a = \sum_{u} u.h
\]
\[
u.h = \sum_{d} d.a
\]
\[
A_k = M^T \cdot M \cdot A_{k - 1}
\]
\[
H_k = M \cdot M^T \cdot H_{k - 1}
\]
Among them, \(A_k\) and \(H_k\) represent the post authority value and user centrality value after the \(k\) - th iteration respectively, and \(M\) is the link relationship matrix between users and posts.
Through these methods, the H - Louvain algorithm can perform community detection more efficiently and accurately, especially when dealing with large - scale social network data.