Cost Adjustment for Software Crowdsourcing Tasks Using Ensemble Effort Estimation and Topic Modeling

Anum Yasmin
DOI: https://doi.org/10.1007/s13369-024-08746-8
IF: 2.807
2024-02-28
Arabian Journal for Science and Engineering
Abstract:Crowdsourced software development (CSSD) is a fast-growing field among software practitioners and researchers from the last two decades. Despite being a favorable environment, no intelligent mechanism exists to assign price to CSSD tasks. Software development effort estimation (SDEE) on the other hand is already an established field in traditional software engineering. SDEE is largely facilitated by machine learning (ML), particularly, ML-based ensemble effort estimation (EEE) which targets accurate estimate by avoiding biases of single ML model. This accuracy of EEE can be exploited for CSSD platforms to establish intelligent cost assignment mechanism. This study aims to integrate EEE with CSSD platform to provide justified costing solution for crowdsourced tasks. Effort-based cost estimation model is proposed, implementing EEE to predict task's effort along with natural language processing (NLP) analysis on task's textual description to assign effort-based cost. TopCoder is selected as targeted CSSD platform, and the proposed scheme is implemented on TopCoder QA category comprising software testing tasks. Ensemble prediction is incorporated via random forest, support vector machine and neural network as base learners. LDA topic modeling is utilized for NLP analysis on the textual aspects of CSSD task, with a specific emphasis on the testing and technology factors. Effort estimation results confirm that EEE models, particularly stacking and weighted ensemble, surpass their base learners with 50% overall increased accuracy. Moreover, R 2 , log-likelihood and topic quality measures confirm considerable LDA model significance. Findings confirmed that cost adjustment achieved from EEE and NLP defines acceptable price range, covering major testing aspects.
multidisciplinary sciences
What problem does this paper attempt to address?