Implementing quasi-parallel breadth-first search in MapReduce for large-scale social network mining

Lianghong Qian,Lei Fan,Jianhua Li
DOI: https://doi.org/10.1109/CASoN.2013.6622606
2013-01-01
Abstract:Online social networks like Weibo and Twitter consist of billions of users and connections, and traditional approaches which are based on serial algorithms and leveraged only a single node or even a single core cannot suffice the that scale of data any more. We propose new distributed quasi-parallel breadth-first search scheme, the common graph traversal algorithm, based on the MapReduce framework, which has better performance (up to one scale of magnitude less time complexity for single-source cases or even better for multiple-source cases) than Pegasus, the state-of-the-art graph mining library, in terms of the complexity of computation and the I/O load. We apply our algorithms on the Weibo dataset, crawled from its website, which contains 135 million users and 10.2 billion directed connections among them, and occupies up to 400 gigabytes. The dataset is by far the largest one of online social networks in research. Based on the Weibo dataset with extremely skewed degree distribution, we give the empirical time complexity and I/O load analysis in each iteration of our proposed methods. Also, We ran the experiments on a 20-node Hadoop cluster to validate our analysis, and the results conform to our predicted empirical results.
What problem does this paper attempt to address?