An Efficient Index-Based Approach to Distributed Set Reachability on Small-World Graphs.
Yuanyuan Zeng,Kenli Li,Xu Zhou,Wensheng Luo,Yunjun Gao
DOI: https://doi.org/10.1109/tpds.2021.3139111
IF: 5.3
2022-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Set reachability query in directed graphs has a plethora of graph-based applications such as dependency analysis and graph centrality calculation. Given two sets $S$ and $T$ of source and target vertices, set reachability query needs to acquire all pairs $(s,t)$ where $s{\in }S$ , $t{\in }T$ and $s$ can reach $t$ . The state-of-the-art approach distributed set reachability (DSR) investigates the set reachability query in a distributed environment and adopts a static graph-based index to enhance the query efficiency. Nevertheless, DSR needs to store the graph-based index in all partitions, which causes a huge space overhead. Furthermore, it cannot efficiently solve the negative query $(s,t)$ where $s$ cannot reach $t$ , since DSR needs to traverse the whole reachable paths and becomes unable to efficiently reduce the computations. To alleviate these issues, we propose a novel multi-level 2-hop (ML2hop) index for the set reachability query in a distributed environment. Based on ML2hop, we further present a bi-directional query algorithm, called MLQA, to achieve efficient support for both positive and negative queries in Pregel-like systems. Generally, MLQA is equipped with the following three significant properties: (1) Low computation costs. It reduces redundant local computations in each partition by controlling the rounds of path traversals. (2) Low communication costs. It restricts the message exchange among different partitions within one single round with guaranteed accuracy of query results. (3) High parallelism. It adopts a bi-directional query technique for message propagation, achieving the better query efficiency than the forward-traversal query strategy utilized in DSR. Experimental results over several real-world graphs demonstrate that MLQA significantly outperforms the state-of-the-art algorithm by up to two orders of magnitude speedup.