Abstract:We study the problem of structural graph clustering, a fundamental problem in managing and analyzing graph data. Given an undirected unweighted graph, structural graph clustering is to assign vertices to clusters, and to identify the sets of hub vertices and outlier vertices as well, such that vertices in the same cluster are densely connected to each other while vertices in different clusters are loosely connected. In this paper, we develop a new two-step paradigm for scalable structural graph clustering based on our three observations. Then, we present a <inline-formula> <tex-math notation="LaTeX">$\mathsf {pSCAN}$</tex-math><alternatives> <inline-graphic xlink:href="chang-ieq2-2618795.gif"/></alternatives></inline-formula> approach, within the paradigm, aiming to reduce the number of structural similarity computations, and propose optimization techniques to speed up checking whether two vertices are structure-similar. <inline-formula><tex-math notation="LaTeX">$\mathsf {pSCAN}$ </tex-math><alternatives><inline-graphic xlink:href="chang-ieq3-2618795.gif"/></alternatives></inline-formula> outputs exactly the same clusters as the existing approaches <inline-formula><tex-math notation="LaTeX">$\mathsf {SCAN}$ </tex-math><alternatives><inline-graphic xlink:href="chang-ieq4-2618795.gif"/></alternatives></inline-formula> and <inline-formula><tex-math notation="LaTeX">$\mathsf {SCAN\text{++}}$</tex-math><alternatives> <inline-graphic xlink:href="chang-ieq5-2618795.gif"/></alternatives></inline-formula>, and we prove that <inline-formula><tex-math notation="LaTeX">$\mathsf {pSCAN}$</tex-math><alternatives> <inline-graphic xlink:href="chang-ieq6-2618795.gif"/></alternatives></inline-formula> is worst-case optimal. Moreover, we propose efficient techniques for updating the clusters when the input graph dynamically changes, and we also extend our techniques to other similarity measures, e.g., Jaccard similarity. Performance studies on large real and synthetic graphs demonstrate the efficiency of our new approach and our dynamic cluster maintenance techniques. Noticeably, for the twitter graph with 1 billion edges, our approach takes 25 minutes while the state-of-the-art approach cannot finish even after 24 hours.

Efficient structural graph clustering: an index-based approach

Effective indexing for dynamic structural graph clustering

Index-based Structural Clustering on Directed Graphs

Manipulating Structural Graph Clustering

$\mathsf {pSCAN}$ : Fast and Exact Structural Graph Clustering

Parallel Index-Based Structural Graph Clustering and Its Approximation

Dynamic Structural Clustering on Graphs

An Efficient Algorithm for Distance-based Structural Graph Clustering

Distributed structural clustering on large graph

Improved Graph Structure Clustering Algorithm by Using Parallel Strategy

DPSCAN: Structural Graph Clustering Based on Density Peaks

MapReduce-Based Graph Structural Clustering Algorithm

pm-SCAN: an I/O Efficient Structural Clustering Algorithm for Large-scale Graphs

Parallelizing Maximal Clique and K-Plex Enumeration over Graph Data

An Algorithm for Identifying Useful Structure in Graphs Clustering

Incremental Structural Clustering for Dynamic Networks

Efficient Maximal Clique Enumeration Over Graph Data

Insights and improvement to a structural clustering algorithm

ADPSCAN: Structural Graph Clustering with Adaptive Density Peak Selection and Noise Re-Clustering

Efficient Structural Clustering on Probabilistic Graphs

Parallelizing Maximal Clique Enumeration Over Graph Data.