Mining the Network of the Programmers

Yezhou Ma,Huiying Li,Jiyao Hu,Rong Xie,Yang Chen
DOI: https://doi.org/10.1145/3127404.3127431
2017-01-01
Abstract:GitHub is a worldwide popular website for version control and source code management. In addition, since its users can follow each other, it also forms a professional social network of millions of users. In this work, we perform a data-driven study for analyzing the GitHub network. By introducing a distributed crawling framework, we first collect profiles and behavioral data of more than 2 million GitHub users. To the best of our knowledge, this is the largest and latest public dataset of GitHub. Then, we build the social graph of these users and conduct a thorough analysis of the network structure. Moreover, we investigate the user behavior patterns, particularly the patterns of the "commit" activities. Finally, we utilize machine learning methods to discover important users in the network with a high accuracy and a low overhead. Our inspiring findings are helpful for GitHub to provide better services for its users.
What problem does this paper attempt to address?