A Method for Identifying References Between Projects in GitHub

Baochuan Liu,Li Zhang,Jing Jiang,Liang Wang
DOI: https://doi.org/10.1016/j.scico.2022.102858
IF: 1.039
2022-01-01
Science of Computer Programming
Abstract:In open source software platforms, software projects do not usually develop in isolation, and they depend on each other and develop together. It is important to identify references between projects in software development activities, which may help projects identify cross-project bugs or attract new contributors from related projects. In this paper, we propose a method IREL to Identify References between projects by Extracting Links. We first extract links from descriptions and comments on issues, pull requests, and commits with three matching patterns. Then we identify changes in project names and replace the original project names with their new project names. Finally, we identify references between projects by selecting links with different source projects and target projects. We evaluate the performance based on datasets with 20,347,228 projects. Our method IREL obtains 934,322 references, 26.461 times as many as the method Reference Coupling and 16.483 times as many as the method Issue Units. Project PageRank scores based on references identified by our method IREL are more correlated with the number of stars of projects. Our method supports researchers to identify references better. (C) 2022 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?