Automatic Tagging for Open Source Software by Utilizing Package Dependency Information

Liu Yang,Li Wang,Zhigang Hu,Yanwen Wang,Jun Long
DOI: https://doi.org/10.1109/TASE49443.2020.00027
2020-01-01
Abstract:The tags of open-source software (OSS) are important for managing and retrieving a massive amount of OSS in the OSS community, untagged OSS makes managing and retrieving OSS on GitHub difficult. However, developers sometimes neglect to write tag for repositories. For example, in our collected dataset with over 43K GitHub repositories, more than 32 % of the repository are unlabeled. To alleviate this problem, we propose an approach to automatically generate repository tag based on a neural network and LDA by utilizing package dependencies and readme among OSS in communities. We design an algorithm for extracting the tag features of dependent OSS packages and build dependent feature vectors for OSS. We then combine the vectors with topic of OSS readme file as input to train the neural network and obtain the tag distribution probability of OSS, and subsequently, recommend tags for OSS. Experiments are performed on the OSS dataset that we collected from GitHub, over 43K repositories and evaluate our approach on this dataset. Experiment results show that DepTagRec performs better than other methods in terms of precision and recall, particularly on recall when recommending the top 10 tags for OSS.
What problem does this paper attempt to address?