Developer Identity Linkage and Behavior Mining Across GitHub and StackOverflow.

Yunxiang Xiong,Zhangyuan Meng,Beijun Shen,Wei Yin
DOI: https://doi.org/10.1142/s0218194017400034
2017-01-01
Abstract:Nowadays, software developers are increasingly involved in GitHub and StackOverflow, creating a lot of valuable data in the two communities. Researchers mine the information in these software communities to understand developer behaviors, while previous works mainly focus on mining data within a single community. In this paper, we propose a novel approach to developer identity linkage and behavior mining across GitHub and StackOverflow. This approach links the accounts from two communities using a CART decision tree, leveraging the features from usernames, user behaviors and writing styles. Then, it explores cross-site developer behaviors through [Formula: see text]-graph analysis, LDA-based topics clustering and cross-site tagging. We conducted several experiments to evaluate this approach. The results show that the precision and [Formula: see text]-score of our identity linkage method are higher than previous methods in software communities. Especially, we discovered that (1) active issue committers are also active question askers; (2) for most developers, the topics of their contents in GitHub are similar to those of those questions and answers in StackOverflow; (3) developers’ concerns in StackOverflow shift over the time of their current participating projects in GitHub; (4) developers’ concerns in GitHub are more relevant to their answers than questions and comments in StackOverflow.
What problem does this paper attempt to address?