Abstract:Abstract Effort‐Aware Defect Prediction (EADP) methods sort software modules based on the defect density and guide the testing team to inspect the modules with high defect density first. Previous studies indicated that some feature selection methods could improve the performance of Classification‐Based Defect Prediction (CBDP) models, and the Correlation‐based feature subset selection method with the Best First strategy (CorBF) performed the best. However, the practical benefits of feature selection methods on EADP performance are still unknown, and blindly employing the best‐performing CorBF method in CBDP to pre‐process the defect datasets may not improve the performance of EADP models but possibly result in performance degradation. To assess the impact of the feature selection techniques on EADP, a total of 24 feature selection methods with 10 classifiers embedded in a state‐of‐the‐art EADP model (CBS+) on the 41 PROMISE defect datasets were examined. We employ six evaluation metrics to assess the performance of EADP models comprehensively. The results show that (1) The impact of the feature selection methods varies in classifiers and datasets. (2) The four wrapper‐based feature subset selection methods with forwards search, that is, AdaBoost with Forwards Search, Deep Forest with Forwards Search, Random Forest with Forwards Search, and XGBoost with Forwards Search (XGBF) are better than other methods across the studied classifiers and the used datasets. And XGBF with XGBoost as the embedded classifier in CBS+ performs the best on the datasets. (3) The best‐performing CorBF method in CBDP does not perform well on the EADP task. (4) The selected features vary with different feature selection methods and different datasets, and the features noc (number of children), ic (inheritance coupling), cbo (coupling between object classes), and cbm (coupling between methods) are frequently selected by the four wrapper‐based feature subset selection methods with forwards search. (5) Using AdaBoost, deep forest, random forest, and XGBoost as the base classifiers embedded in CBS+ can achieve the best performance. In summary, we recommend the software testing team should employ XGBF with XGBoost as the embedded classifier in CBS+ to enhance the EADP performance.

FECAR: A Feature Selection Framework for Software Defect Prediction

A Noise Tolerable Feature Selection Framework for Software Defect Prediction

Empirical studies on feature selection for software fault prediction

A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction

FECS: A Cluster Based Feature Selection Method for Software Fault Prediction with Noises

FSDNP:Feature Selection Method for Software Defect Number Prediction

FeSCH: A Feature Selection Method Using Clusters of Hybrid-data for Cross-Project Defect Prediction.

EFSPredictor: Predicting Configuration Bugs with Ensemble Feature Selection.

An empirical analysis of feature selection techniques for Software Defect Prediction

A Software Defect Prediction Approach Based on Hybrid Feature Dimensionality Reduction

The Impact of Feature Selection Techniques on Effort-Aware Defect Prediction: an Empirical Study.

Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning

A software defect prediction method with metric compensation based on feature selection and transfer learning

MCDM-EFS: A novel ensemble feature selection method for software defect prediction using multi-criteria decision making

Cross‐project defect prediction method based on genetic algorithm feature selection

Software defect prediction based on nested-stacking and heterogeneous feature selection

An Integrated Semi-supervised Software Defect Prediction Model

Feature Selection Using Firefly Algorithm With Tree-Based Classification In Software Defect Prediction

Combined Classifier for Cross-Project Defect Prediction: an Extended Empirical Study.

Discriminating features-based cost-sensitive approach for software defect prediction

An Empirical Study on Pareto Based Multi-Objective Feature Selection for Software Defect Prediction