Abstract:This investigation focuses on refining software effort estimation (SEE) to enhance project outcomes amidst the rapid evolution of the software industry. Accurate estimation is a cornerstone of project success, crucial for avoiding budget overruns and minimizing the risk of project failures. The framework proposed in this article addresses three significant issues that are critical for accurate estimation: dealing with missing or inadequate data, selecting key features, and improving the software effort model. Our proposed framework incorporates three methods: the Novel Incomplete Value Imputation Model (NIVIM), a hybrid model using Correlation-based Feature Selection with a meta-heuristic algorithm (CFS-Meta), and the Heterogeneous Ensemble Model (HEM). The combined framework synergistically enhances the robustness and accuracy of SEE by effectively handling missing data, optimizing feature selection, and integrating diverse predictive models for superior performance across varying project scenarios. The framework significantly reduces imputation and feature selection overhead, while the ensemble approach optimizes model performance through dynamic weighting and meta-learning. This results in lower mean absolute error (MAE) and reduced computational complexity, making it more effective for diverse software datasets. NIVIM is engineered to address incomplete datasets prevalent in SEE. By integrating a synthetic data methodology through a Variational Auto-Encoder (VAE), the model incorporates both contextual relevance and intrinsic project features, significantly enhancing estimation precision. Comparative analyses reveal that NIVIM surpasses existing models such as VAE, GAIN, K-NN, and MICE, achieving statistically significant improvements across six benchmark datasets, with average RMSE improvements ranging from 11.05% to 17.72% and MAE improvements from 9.62% to 21.96%. Our proposed method, CFS-Meta, balances global optimization with local search techniques, substantially enhancing predictive capabilities. The proposed CFS-Meta model was compared to single and hybrid feature selection models to assess its efficiency, demonstrating up to a 25.61% reduction in MSE. Additionally, the proposed CFS-Meta achieves a 10% (MAE) improvement against the hybrid PSO-SA model, an 11.38% (MAE) improvement compared to the Hybrid ABC-SA model, and 12.42% and 12.703% (MAE) improvements compared to the hybrid Tabu-GA and hybrid ACO-COA models, respectively. Our third method proposes an ensemble effort estimation (EEE) model that amalgamates diverse standalone models through a Dynamic Weight Adjustment-stacked combination (DWSC) rule. Tested against international benchmarks and industry datasets, the HEM method has improved the standalone model by an average of 21.8% (Pred()) and the homogeneous ensemble model by 15% (Pred()). This comprehensive methodology underscores our model’s contributions to advancing software project management (SPM) through advanced predictive modeling, setting a new benchmark for software engineering effort estimation.

How to Make Best Use of Cross-Company Data in Software Effort Estimation?

Can Cross-Company Data Improve Performance in Software Effort Estimation?

Which Models of the Past Are Relevant to the Present? A Software Effort Estimation Approach to Exploiting Useful Past Models.

Heterogeneous Cross-Company Effort Estimation Through Transfer Learning.

Feature mapping based on heterogeneous cross-company effort estimation

Framework to Improve Software Effort Estimation Accuracy Using Novel Ensemble Rule

On the relative value of cross-company and within-company data for defect prediction

Learning to Cope with Small Noisy Data in Software Effort Estimation

Cross-Project Online Just-In-Time Software Defect Prediction

Heterogeneous Ensemble Model to Optimize Software Effort Estimation Accuracy

Performance Improvement On Cross-Efficiencies And Applications To Competitive Advantages Of Chinese Cities

Software Effort Estimation As a Multiobjective Learning Problem

Heterogeneous Cross-Company Defect Prediction by Unified Metric Representation and CCA-based Transfer Learning

Time-Aware Models for Software Effort Estimation

DEA cross-efficiency models with prospect theory and distance entropy: An empirical study on high-tech industries

Enhancing Software Effort Estimation through Reinforcement Learning-based Project Management-Oriented Feature Selection

Specialization and Extrapolation of Software Cost Models.

Selecting Best Practices for Effort Estimation

Cross‐estimation for Decision Selection

Consensus reaching for prospect cross-efficiency in data envelopment analysis with minimum adjustments

A Study of Improving the Accuracy of Software Effort Estimation Using Linearly Weighted Combinations