GEMiner: Mining Social and Programming Behaviors to Identify Experts in Github

Wenkai Mo,Beijun Shen,Yuming He,Hao Zhong
DOI: https://doi.org/10.1145/2875913.2875924
2015-01-01
Abstract:Hosting over 10 million repositories, GitHub becomes the largest open source community in the world. Besides sharing code, Github is also a social network, in which developers can follow others or keep track of their interested projects. Considering the multi-roles of Github, integrating heterogenous data of each developer to identify experts is a challenging task. In this paper, we propose GEMiner, a novel approach to identify experts for some specific programming languages in Github. Different from previous approaches, GEMiner analyzes the social behaviors and programming behaviors of a developer to determine the expertise of the developer. When modeling social behaviors of developers, to integrate heterogenous social networks in Github, GEMiner implements a Multi-Sources PageRank algorithm. Also, GEMiner analyzes the behaviors of developers when they are programming (e.g., their commit activities and their preferred programming languages) to model programming behaviors of them. Based on our expertise models and our extracted programming languages data, GEMiner can then identify experts for some specific programming languages in Github. We conducted experiments on a real data set, and our results show that GEMiner identifies experts with 60% accuracy higher than the state-of-the-art algorithms.
What problem does this paper attempt to address?