GraphC: Parameter-free Hierarchical Clustering of Signed Graph Networks v2

Muhieddine Shebaro,Lucas Rusnak,Martin Burtscher,Jelena Tešić
2024-11-01
Abstract:Spectral clustering methodologies, when extended to accommodate signed graphs, have encountered notable limitations in effectively encapsulating inherent grouping relationships. Recent findings underscore a substantial deterioration in the efficacy of spectral clustering methods when applied to expansive signed networks. We introduce a scalable hierarchical Graph Clustering algorithm denominated GraphC. This algorithm excels at discerning optimal clusters within signed networks of varying magnitudes. GraphC aims to preserve the positive edge fractions within communities during partitioning while concurrently maximizing the negative edge fractions between communities. Importantly, GraphC does not require a predetermined cluster count (denoted as k). Empirical substantiation of GraphC 's efficacy is provided through a comprehensive evaluation involving fourteen datasets juxtaposed against ten baseline signed graph clustering algorithms. The algorithm's scalability is demonstrated through its application to extensive signed graphs drawn from Amazon-sourced datasets, each comprising tens of millions of vertices and edges. A noteworthy accomplishment is evidenced, with an average cumulative enhancement of 18.64% (consisting of the summation of positive edge fractions within communities and negative edge fractions between communities) over the second-best baseline for each respective signed graph. It is imperative to note that this evaluation excludes instances wherein all baseline algorithms failed to execute comprehensively.
Social and Information Networks
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the significant limitations encountered by existing spectral clustering methods when dealing with signed graphs. Specifically, these problems include: 1. **Performance degradation of existing spectral clustering methods in large - scale sparse signed networks**: - When existing spectral clustering methods are applied to large - scale sparse signed networks, their effectiveness is significantly reduced. Especially when dealing with real - data sources containing millions of nodes and edges, these methods perform poorly. 2. **Difficulty in determining the number of communities**: - Current signed - graph clustering algorithms usually need to pre - determine the number of communities \( k \), which is a challenge in practical applications. Selecting an appropriate \( k \) value depends on subjective judgment and is easily affected by network characteristics and other factors. 3. **Imbalance between positive and negative edges**: - In most signed networks, the ratio of positive edges to negative edges is unbalanced. Existing clustering methods often cannot effectively handle both types of edges simultaneously, resulting in the quality of clustering results being affected. To solve the above problems, the author proposes a parameter - free hierarchical clustering algorithm named GraphC. The main contributions and features of GraphC include: - **No need to pre - define the number of communities**: GraphC does not need to pre - specify the number of communities \( k \), thus avoiding the subjectivity and uncertainty brought by choosing the \( k \) value. - **Ability to handle large - scale signed graphs**: GraphC can handle large - scale signed graphs containing tens of millions of nodes and edges, demonstrating its scalability in handling large - scale data sets. - **Optimization of the balance between positive and negative edges**: GraphC takes into account both the proportion of positive edges within the community and the proportion of negative edges between communities, aiming to maximize these two proportions and thus improve the clustering quality. - **Innovation based on balance theory**: GraphC introduces the concept of balance theory, and re - defines the clustering problem through Harary cuts, so as to better capture the structural characteristics in signed networks. In summary, this paper proposes a new signed - graph clustering algorithm, GraphC, which aims to overcome the limitations of existing spectral clustering methods when dealing with large - scale sparse signed networks. In particular, it can cluster more effectively and maintain the balance between positive and negative edges without the need to pre - define the number of communities.