Scalable and Efficient Flow-Based Community Detection for Large-Scale Graph Analysis
Seung-Hee Bae,Daniel Halperin,Jevin D. West,Martin Rosvall,Bill Howe
DOI: https://doi.org/10.1145/2992785
IF: 4.157
2017-04-14
ACM Transactions on Knowledge Discovery from Data
Abstract:Community detection is an increasingly popular approach to uncover important structures in large networks. Flow-based community detection methods rely on communication patterns of the network rather than structural properties to determine communities. The Infomap algorithm in particular optimizes a novel objective function called the map equation and has been shown to outperform other approaches in third-party benchmarks. However, Infomap and its variants are inherently sequential, limiting their use for large-scale graphs. In this article, we propose a novel algorithm to optimize the map equation called RelaxMap. RelaxMap provides two important improvements over Infomap: parallelization, so that the map equation can be optimized over much larger graphs, and prioritization, so that the most important work occurs first, iterations take less time, and the algorithm converges faster. We implement these techniques using OpenMP on shared-memory multicore systems, and evaluate our approach on a variety of graphs from standard graph clustering benchmarks as well as real graph datasets. Our evaluation shows that both techniques are effective: RelaxMap achieves 70% parallel efficiency on eight cores, and prioritization improves algorithm performance by an additional 20--50% on average, depending on the graph properties. Additionally, RelaxMap converges in the similar number of iterations and provides solutions of equivalent quality as the serial Infomap implementation.
computer science, information systems, software engineering