Abstract:Defect prediction is an important task for preserving software quality. Most prior work on defect prediction uses software features, such as the number of lines of code, to predict whether a file or commit will be defective in the future. There are several reasons to keep the number of features that are used in a defect prediction model small. For example, using a small number of features avoids the problem of multicollinearity and the so-called ‘curse of dimensionality’. Feature selection and reduction techniques can help to reduce the number of features in a model. Feature selection techniques reduce the number of features in a model by selecting the most important ones, while feature reduction techniques reduce the number of features by creating new, combined features from the original features. Several recent studies have investigated the impact of feature selection techniques on defect prediction. However, there do not exist large-scale studies in which the impact of multiple feature reduction techniques on defect prediction is investigated. In this paper, we study the impact of eight feature reduction techniques on the performance and the variance in performance of five supervised learning and five unsupervised defect prediction models. In addition, we compare the impact of the studied feature reduction techniques with the impact of the two best-performing feature selection techniques (according to prior work). The following findings are the highlights of our study: (1) The studied correlation and consistency-based feature selection techniques result in the best-performing supervised defect prediction models, while feature reduction techniques using neural network-based techniques (restricted Boltzmann machine and autoencoder) result in the best-performing unsupervised defect prediction models. In both cases, the defect prediction models that use the selected/generated features perform better than those that use the original features (in terms of AUC and performance variance). (2) Neural network-based feature reduction techniques generate features that have a small variance across both supervised and unsupervised defect prediction models. Hence, we recommend that practitioners who do not wish to choose a best-performing defect prediction model for their data use a neural network-based feature reduction technique.

The Impact of Feature Selection on Defect Prediction Performance: an Empirical Comparison

An Empirical Study on the Effectiveness of Feature Selection for Cross-Project Defect Prediction

An Empirical Study on the Equivalence and Stability of Feature Selection for Noisy Software Defect Data

An empirical analysis of feature selection techniques for Software Defect Prediction

The Impact of Feature Selection Techniques on Effort-Aware Defect Prediction: an Empirical Study.

A Noise Tolerable Feature Selection Framework for Software Defect Prediction

Empirical studies on feature selection for software fault prediction

FECAR: A Feature Selection Framework for Software Defect Prediction

The impact of feature reduction techniques on defect prediction models

Classifier Evaluation for Software Defect Prediction

An Empirical Study on Pareto Based Multi-Objective Feature Selection for Software Defect Prediction

Feature selection for software defect prediction using an improved firefly algorithm

FSDNP:Feature Selection Method for Software Defect Number Prediction

A Software Defect Prediction Approach Based on Hybrid Feature Dimensionality Reduction

Predicting the Severity of Bug Reports Based on Feature Selection.

ELM and KELM based software defect prediction using feature selection techniques

An Empirical Study of Public Data Quality Problems in Cross Project Defect Prediction

EFSPredictor: Predicting Configuration Bugs with Ensemble Feature Selection.

Cross‐project defect prediction method based on genetic algorithm feature selection

A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction

FECS: A Cluster Based Feature Selection Method for Software Fault Prediction with Noises