KANGAROO: A Distributed System for SNA - Social Network Analysis in Huge-scale Networks

Bin Wu,Yuxiao Dong,Lei Qin,Qing Ke,Bai Wang
DOI: https://doi.org/10.5220/0003387304040409
2011-01-01
Abstract:Social network analysis is the mapping and measuring of relationships and flows between people, groups, computers and other information or knowledge entities. The continued exponential growth in the scale of social networks is giving birth to a new challenge to social network analysis. The scale of these graphs, in some cases, is millions of nodes and billions of edges. In this paper, we present a distributed system, KANGAROO, for huge scale social network based on two main computing models which are for finding common neighbour and maximal clique. KANGAROO is implemented on the top of the Hadoop platform, the open source version of MapReduce. This system implements most algorithms of social network analysis, including basic statistics, community detection, link prediction and network evolution etc. based on the MapReduce computing framework. More than anything else, KANGAROO is applied to a real-world huge scale social network. The application scenarios, including degree distribution, linear projection algorithm for community detection and community visualization of presentation layer, demonstrate KANGAROO is efficient, scalable and effective.
What problem does this paper attempt to address?