Inferring Higher-Order Structure Statistics of Large Networks from Sampled Edges

Pinghui Wang,Yiyan Qi,John C. S. Lui,Don Towsley,Junzhou Zhao,Jing Tao
DOI: https://doi.org/10.1109/tkde.2017.2685584
IF: 9.235
2017-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Recently exploring locally connected subgraphs (also known as motifs or graphlets) of complex networks attracts a lot of attention. Previous work made the strong assumption that the graph topology of interest is known in advance. In practice, sometimes researchers have to deal with the situation where the graph topology is unknown because it is expensive to collect and store all topological information. Hence, typically what is available to researchers is only a snapshot of the graph, i.e., a subgraph of the graph. Crawling methods such as breadth first sampling can be used to generate the snapshot. However, these methods fail to sample a streaming graph represented as a high speed stream of edges. Therefore, graph mining applications such as network traffic monitoring usually use random edge sampling (i.e., sample each edge with a fixed probability) to collect edges and generate a sampled graph, which we call a “ RESampled graph”. Clearly, a RESampled graph's motif statistics may be quite different from those of the original graph. To resolve this, we propose a framework Minfer, which takes the given RESampled graph and accurately infers the underlying graph's motif statistics. Experiments using large scale datasets show the accuracy and efficiency of our method.
What problem does this paper attempt to address?