Latent Geometry Inspired Graph Dissimilarities Enhance Affinity Propagation Community Detection in Complex Networks

Carlo Vittorio Cannistraci,Alessandro Muscoloni
DOI: https://doi.org/10.48550/arXiv.1804.04566
2018-08-29
Abstract:Affinity propagation is one of the most effective unsupervised pattern recognition algorithms for data clustering in high-dimensional feature space. However, the numerous attempts to test its performance for community detection in complex networks have been attaining results very far from the state of the art methods such as Infomap and Louvain. Yet, all these studies agreed that the crucial problem is to convert the unweighted network topology in a 'smart-enough' node dissimilarity matrix that is able to properly address the message passing procedure behind affinity propagation clustering. Here we introduce a conceptual innovation and we discuss how to leverage network latent geometry notions in order to design dissimilarity matrices for affinity propagation community detection. Our results demonstrate that the latent geometry inspired dissimilarity measures we design bring affinity propagation to equal or outperform current state of the art methods for community detection. These findings are solidly proven considering both synthetic 'realistic' networks (with known ground-truth communities) and real networks (with community metadata), even when the data structure is corrupted by noise artificially induced by missing or spurious connectivity.
Machine Learning,Social and Information Networks,Physics and Society
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to effectively apply the Affinity Propagation (AP) algorithm to community detection in complex networks. Although AP performs well in data clustering in high - dimensional feature spaces, its performance in community detection tasks is far inferior to current optimal methods such as Infomap and Louvain. Researchers generally believe that the key problem lies in how to transform an unweighted network topology into a "smart" enough node dissimilarity matrix to correctly guide the message - passing process behind the AP algorithm. ### Specific Problem Description 1. **Limitations of Existing Methods**: - Previous studies have attempted various methods to construct the dissimilarity matrix, but most of these methods are based on hand - designed or known network science metrics and lack theoretical basis. - These methods fail to fully utilize the information of the latent geometry of the network, resulting in poor performance of AP in community detection. 2. **Proposed New Method**: - The paper introduces a conceptual innovation, that is, using the concept of network latent geometry to design a dissimilarity matrix suitable for AP. - Two new dissimilarity measures are proposed: the Repulsion - Attraction rule (RA) based on local information and the Edge Betweenness Centrality (EBC) based on global information. 3. **Verification and Evaluation**: - Through experiments on synthetic networks (with known real - community structures) and real networks (with community metadata), the effectiveness of the new method is proved. - The experimental results show that the dissimilarity measures inspired by latent geometry can make the performance of the AP algorithm reach or exceed the current optimal community detection methods, and can maintain good performance even when the network structure is disrupted by noise. ### Mathematical Formulas 1. **Repulsion - Attraction Rule (RA)**: \[ R A_{i j}=\frac{1 + e_i+e_j}{1 + c n_{i j}} \] where \( e_i \) and \( e_j \) are the external degrees of nodes \( i \) and \( j \) respectively (the number of neighbors not connected to common neighbors or each other), and \( c n_{i j} \) is the number of common neighbors of nodes \( i \) and \( j \). 2. **Edge Betweenness Centrality (EBC)**: \[ E B C_{i j}=\sum_{s, t} \frac{\sigma(s, t | e_{i j})}{\sigma(s, t)} \] where \( \sigma(s, t) \) is the number of shortest paths between nodes \( s \) and \( t \), and \( \sigma(s, t | e_{i j}) \) is the number of shortest paths passing through edge \( e_{i j} \). ### Conclusion The core objective of the paper is to design a more effective dissimilarity matrix by introducing the concept of network latent geometry, thereby significantly improving the performance of the AP algorithm in community detection in complex networks. The experimental results show that this innovation not only makes AP reach the level comparable to the existing optimal methods, but even surpasses them in some cases.