Abstract:This paper investigates the use of k‐nearest neighbors imputation (KNNI) to deal with missing data in software development effort estimation (SDEE). KNNI, in its classical process, has low tolerance to imprecision and uncertainty especially when dealing with categorical features. We evaluate the use of an optimized fuzzy clustering‐based KNNI (FC‐KNNI) and compare it with classical KNN when dealing with mixed data in the context of SDEE. The results are promising in the sense that using an imputation technique designed for mixed data is better than reusing methods originally designed for numerical data. KNNI, in its classical process, has low tolerance to imprecision and uncertainty especially when dealing with categorical features. Context Software development effort estimation (SDEE) is one of the most challenging aspects in project management. The presence of missing data (MD) in software attributes makes SDEE even more complex. K‐nearest neighbors imputation (KNNI) has been widely used in SDEE to deal with the MD issue. However, KNNI, in its classical process, has low tolerance to imprecision and uncertainty especially when dealing with categorical features. When dealing with categorical attributes, KNNI uses a classical approach, employing mainly numbers or classical intervals to represent software attributes and similarity measures originally designed for numerical attributes. Objectives This paper evaluates the use of an optimized fuzzy clustering‐based KNNI (FC‐KNNI) and compares it with classical KNN when dealing with mixed data in the context of SDEE. Methods We investigate the effect of two imputation techniques (FC‐KNNI and KNNI) on five SDEE techniques: case‐based reasoning, fuzzy case‐based reasoning, support vector regression, multilayer perceptron, and reduced‐error pruning tree. The evaluation is carried out using six publicly available datasets for SDEE using two performance measures, standardized accuracy (SA), and Pred (0.25). The Wilcoxon statistical test is also performed to assess the significance of results. Results The results are promising in the sense that using an imputation technique designed for mixed data is better than reusing methods originally designed for numerical data. We found that FC‐KNNI significantly outperforms KNNI regardless of the SDEE technique and dataset used. Another important finding is that F‐CBR improved the analogy process compared to CBR. Conclusion The introduction of fuzzy sets and fuzzy clustering in the analogy process improves its performances in terms of SA and Pred (0.25).

Software effort estimation using convolutional neural network and fuzzy clustering

Product Cost Estimation Based on Dynamic Fuzzy Neural Network

Software cost estimation predication using a convolutional neural network and particle swarm optimization algorithm

Software effort estimation modeling and fully connected artificial neural network optimization using soft computing techniques

Deep artificial neural network based multilayer gated recurrent model for effective prediction of software development effort

Comparing Soft Computing Techniques For Early Stage Software Development Effort Estimations

Neural Network Models for Software Development Effort Estimation: A Comparative Study

Enhancing effort estimation in global software development using a unique combination of Neuro Fuzzy Logic and Deep Learning Neural Networks (NFDLNN)

Optimized fuzzy clustering‐based k‐nearest neighbors imputation for mixed missing data in software development effort estimation

A Neuro-Fuzzy Model with SEER-SEM for Software Effort Estimation

An Optimized LSTM Neural Network for Accurate Estimation of Software Development Effort

A Hybrid Model for Estimating Software Project Effort from Use Case Points

Predicting the Number of Software Faults using Deep Learning

A STUYDY ON EFFORT ESTIMATION MODELS FOR THE SOFTWARE PROJECT MANAGEMENT

Software Effort Estimation using Neuro Fuzzy Inference System: Past and Present

Optimized Fuzzy Logic Based Framework for Effort Estimation in Software Development

Neuro-Fuzzy Algorithmic (NFA) Models and Tools for Estimation

Incremental regularized Data Density-Based Clustering neural networks to aid in the construction of effort forecasting systems in software development

Heterogeneous Ensemble Model to Optimize Software Effort Estimation Accuracy

A Classical Fuzzy Approach for Software Effort Estimation on Machine Learning Technique

A Study of Improving the Accuracy of Software Effort Estimation Using Linearly Weighted Combinations