Software Effort Estimation Based on Open Source Projects: Case Study of Github

Fumin Qi,Xiao-Yuan Jing,Xiaoke Zhu,Xiaoyuan Xie,Baowen Xu,Shi Ying
DOI: https://doi.org/10.1016/j.infsof.2017.07.015
IF: 3.9
2017-01-01
Information and Software Technology
Abstract:Context: Managers usually want to pre-estimate the effort of a new project for reasonably dividing their limited resources. In reality, it is common practice to train a prediction model based on effort datasets to predict the effort required by a project. Sufficient data is the basis for training a good estimator, yet most of the data owners are unwilling to share their closed source project (CSP) effort data due to the privacy concerns, which means that we can only obtain a small number of effort data. Effort estimator built on the limited data usually cannot satisfy the practical requirement. Objective: We aim to provide a method which can be used to collect sufficient data for solving the problem of lack of training data when building an effort estimation model. Method: We propose to mine GitHub to collect sufficient and diverse real-life effort data for effort estimation. Specifically, we first demonstrate the feasibility of our cost metrics (including functional point analysis and personnel factors). In particular, we design a quantitative method for evaluating the personnel metrics based on GitHub data. Then we design a samples incremental approach based on AdaBoost and Classification And Regression Tree (ABCART) to make the collected dataset owns dynamic expansion capability. Results: Experimental results on the collected dataset show that: (1) the personnel factor is helpful for improving the performance of the effort estimation. (2) the proposed ABCART algorithm can increase the samples of the collected dataset online. (3) the estimators built on the collected data can achieve comparable performance with those of the estimators which built on existing effort datasets. Conclusions: Effort estimation based on Open Source Project (OSP) is an effective way for getting the effort required by a new project, especially for the case of lacking training data. (C) 2017 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?