Optimal Representation for Web and Social Network Graphs Based on <inline-formula> <tex-math notation="LaTeX">${K}^{2}$ </tex-math></inline-formula>-Tree

Fengying Li,Qi Zhang,Tianlong Gu,Rongsheng Dong
DOI: https://doi.org/10.1109/ACCESS.2019.2912172
IF: 3.9
2019-01-01
IEEE Access
Abstract:With the rapid growth of the Internet, the scale of graphs has increased dramatically, which poses special challenges in representing both web graphs and social network graphs. In the adjacency matrix of web and social network graphs, only a very small proportion of the elements is “1” s. Furthermore, we find that using the aggregation of scattered 1 s to form a high density of adjacency matrices is beneficial to the compression of storage space. Based on these findings, we propose the DGC-K <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -tree compression approach based on K <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -tree, which can greatly increase the density of 1 s among the existing algorithms and adequately compress the blank area in the adjacency matrix. Then, we design a query algorithm for this mechanism to support the operation on the graph. The experimental results show that compared with the state-of-the-art algorithms, including the K <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -tree based on a diagonal clustering mechanism (K <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -BDC), the K <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -tree, Re-Pair, and LZ78, our approach achieves better compression ratio and shorter time consumption. In terms of storage efficiency, our approach reduces the space by an average of 34.07% compared to the best performing algorithm K <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -BDC. In terms of query efficiency, our approach reduces the time by an average of 80.63% compared to the best performing algorithm LZ78.
What problem does this paper attempt to address?