Abstract:Detecting software defects before they occur is crucial in software engineering as it impacts software system quality and reliability. Previous studies on predicting software defects have typically employed software features, such as code size, complexity, coupling, cohesion, inheritance, and other software metrics., to forecast whether a code file or commit is prone to defects in the future. However, it is advantageous to restrict the number of features employed in a defect prediction model to avoid the challenges associated with multicollinearity and the “curse of dimensionality” and to simplify the data analysis process. By using a reduced number of features, the defect prediction model can concentrate on the most significant variables and improve its accuracy. This research paper investigates the impact of eight feature selection methods on the accuracy and stability of six supervised learning models. This study is novel as it is based on exhaustive experimentation of each of the eight feature selection techniques with each of the six supervised learning models. Two notable findings have been obtained. First, we discovered that the association and coherence-based techniques have demonstrated the highest level of accuracy when it comes to defect prediction. The models that utilized these selected features outperformed those using the original features. Second, the feature selection techniques, namely Correlation feature selection, Recursive feature elimination, and Ridge feature selection when combined with the Support vector machine and Decision tree classifier, consistently selected low-variance features across multiple supervised defect prediction models. When combined with different classifiers, these techniques achieved exceptional performance on the publicly available NASA datasets CM1 and PC2. The findings revealed a remarkable accuracy rate of over 85% for CM1 and 95% for PC2, accompanied by precision, recall, and f-measure values exceeding 95%. These exceptional results indicate the achievement of the highest level of performance in the evaluation.

Software Defect Prediction Model Using AdaBoost based Random Forest Technique

A New Improved Prediction of Software Defects Using Machine Learning-based Boosting Techniques with NASA Dataset

Cost-Sensitive Radial Basis Function Neural Network Classifier for Software Defect Prediction

Performance evaluation of software defect prediction with NASA dataset using machine learning techniques

Improving Software Defect Prediction With a Combination of Feature Selection Based On Ant Colony Optimization and Ensemble Technique

Software Defect Prediction System Based on Decision Tree Algorithm

Software Defect Prediction Using Dagging Meta-Learner-Based Classifiers

Optimized Deeplearning Algorithm for Software Defects Prediction

Software defects prediction by metaheuristics tuned extreme gradient boosting and analysis based on Shapley Additive Explanations

Software Defect Prediction Using an Intelligent Ensemble-Based Model

Combined Classifier for Cross-Project Defect Prediction: an Extended Empirical Study.

A feature selection model for software defect prediction using binary Rao optimization algorithm

Software Defect Prediction Analysis Using Machine Learning Techniques

Understanding machine learning software defect predictions

An empirical analysis of feature selection techniques for Software Defect Prediction

A Software Defect Prediction Approach Based on Hybrid Feature Dimensionality Reduction

Software Defect Prediction Model Based on Improved Deep Forest and AutoEncoder by Forest.

A Review On Software Defects Prediction Methods

Software Defect Prediction using Deep Learning by Correlation Clustering of Testing Metrics

Software Defect Prediction Using Machine Learning Techniques

Use of Deep Learning Model with Attention Mechanism for Software Fault Prediction