An Empirical Study on the Equivalence and Stability of Feature Selection for Noisy Software Defect Data

Zhou Xu,Jin Liu,Zhen Xia,Peipei Yuan
DOI: https://doi.org/10.18293/seke2017-097
2017-01-01
Abstract:Software Defect Data (SDD) are used to build defect prediction models for software quality assurance.Existing work employs feature selection to eliminate irrelevant features in the data to improve prediction performance.Previous studies have shown that different feature selection methods do not always yield similar prediction performance on SDD, which indicates that these methods are not equivalent.Also, previous studies have shown that SDD usually contains noise that may interfere the process of feature selection.In this work, we empirically investigate and measure the equivalence of different feature selection methods for SDD.Further, we intend to analyze the stability of the methods for noisy SDD.We perform statistical analyses on eight projects from NASA dataset with eight feature selection methods.For the equivalence analysis, we introduce Principal Component Analysis (PCA) and overlap index to qualitatively and quantitatively analyze the equivalence of these methods respectively.For the stability analysis, we apply consistency index to measure the stability of these methods.Experimental results indicate that different feature selection methods are indeed not equivalent to each other, and Correlation and Fisher Score methods achieve better stability.
What problem does this paper attempt to address?