Abstract:<h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Context:</h3><p>Software crash is a serious form of the software failure, which often occurs during the software development and maintenance process. As the stack trace reported when the software crashes contains a wealth of information about crashes, recent work utilized classification models with the collected features from stack traces and source code to predict whether the fault causing the crash resides in the stack trace. This could speed-up the crash localization task.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Objective:</h3><p>As the quality of features can affect the performance of the constructed classification models, researchers proposed to use feature selection methods to select a representative feature subset to build models by replacing the original features. However, only limited feature selection methods and classification models were taken into consideration for this issue in previous work. In this work, we look into this topic deeply and find out the best feature selection method for crash fault residence prediction task.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Method:</h3><p>We study the performance of 24 feature selection techniques with 21 classification models on a benchmark dataset containing crash instances from 7 real-world software projects. We use 4 indicators to evaluate the performance of these feature selection methods which are applied to the classification models.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Results:</h3><p>The experimental results show that, overall, a probability-based feature selection, called Symmetrical Uncertainty, performs well across the studied classification models and projects. Thus, we recommend such a feature selection method to preprocess the crash instances before constructing classification models to predict the crash fault residence.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Conclusion:</h3><p>This work conducts a large-scale empirical study to investigate the impact of feature selection methods on the performance of classification models for the crashing fault residence prediction task. The results clearly demonstrate that there exist significant performance differences among these feature selection techniques across different classification models and projects.</p>

An Empirical Study on the Equivalence and Stability of Feature Selection for Noisy Software Defect Data

A Noise Tolerable Feature Selection Framework for Software Defect Prediction

An empirical analysis of feature selection techniques for Software Defect Prediction

FECS: A Cluster Based Feature Selection Method for Software Fault Prediction with Noises

Empirical studies on feature selection for software fault prediction

U^2F^2S^2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection

An Empirical Study on Pareto Based Multi-Objective Feature Selection for Software Defect Prediction

FSDNP:Feature Selection Method for Software Defect Number Prediction

FECAR: A Feature Selection Framework for Software Defect Prediction

Feature Selection With Local Density-Based Fuzzy Rough Set Model for Noisy Data

The Impact of Feature Selection Techniques on Effort-Aware Defect Prediction: an Empirical Study.

A many objective based feature selection model for software defect prediction

EFSPredictor: Predicting Configuration Bugs with Ensemble Feature Selection.

ELM and KELM based software defect prediction using feature selection techniques

Discriminating features-based cost-sensitive approach for software defect prediction

A comprehensive investigation of the impact of feature selection techniques on crashing fault residence prediction models

Analysis and comparison of feature selection methods towards performance and stability

A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction

A Software Defect Prediction Approach Based on Hybrid Feature Dimensionality Reduction

Feature Selection: A Data Perspective

A software defect prediction method with metric compensation based on feature selection and transfer learning