A big graph clustering method to support parallel processing by perceiving graph’s application algorithm semantics

Tengteng Cheng,Guosun Zeng,Zhipeng Sun
DOI: https://doi.org/10.1007/s11227-023-05572-x
IF: 3.3
2023-08-22
The Journal of Supercomputing
Abstract:As the size of graph data grows exponentially, it is usually processed by parallel and distributed computing environment. During parallel processing, the first step is to divide the graph data into many subgraphs and then place them on different computational nodes. However, the existing graph clustering methods cannot adapt to parallel processing or improve computing performance. The main reason is that these methods only focus on the graph data but do not care about the graph’s application algorithm. Therefore, this paper addresses a graph clustering method that supports parallel processing by perceiving the application algorithm semantics. We specifically focus on the characteristic of "tight inside and loose outside". Based on this characteristic, we give a new formula for calculating the modularity of the subgraph. In a graph application, graph data and its associated application algorithm are not separated from each other. Thus, we also focus on the execution patterns of the application algorithm. Combining the modularity of a subgraph and execution patterns of an application algorithm, we present the critical concept of semantic serial degree as a new criterion for big graph clustering. Consequently, we propose a graph clustering method that effectively balances the importance between the graph data and its associated application algorithm. By achieving this balance, our approach ensures that the final clustering results are more suitable for parallel and distributed processing. Extended experiments show that our proposed graph clustering method is more general and compatible with traditional clustering methods. Compared with FastUnfolding, a state-of-the-art graph clustering method, the completion times of an application algorithm are significantly reduced due to the help of the proposed graph clustering method.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?