A Noise Tolerable Feature Selection Framework for Software Defect Prediction

Wang-Shu LIU,Xiang CHEN,Qing GU,Shu-Long LIU,Dao-Xu CHEN
DOI: https://doi.org/10.11897/SP.J.1016.2018.00506
2018-01-01
Abstract:Software defect prediction constructs a software defect prediction model based on the mining of software historical repositories.Then it uses the trained model to predict potential defect-proneness program modules.However noises are inevitable when labeling or measuring the software entities.Although some researchers have investigated the noise tolerance of existing feature selection methods,few studies focus on proposing new feature selection methods with a certain noise tolerance.To solve this issue,we propose a novel framework FECS (FEature Clustering with Selection strategies).In particular,FECS first cluster original features into specified number of clusters based on cluster analysis.Then it selects a most typical feature from each cluster based on our proposed three heuristic feature selection strategies.During empirical studies,we choose real-world software projects,such as Eclipse and NASA.We first perform a set of data preprocessing steps to improve the quality of these datasets.We then inject class level and feature level noises simultaneously to imitate noisy datasets.After using classical feature selection methods as the baseline,we confirm the effectiveness of FECS and provide a guideline of using FECS after analyzing the effects of varying either percentage of selected features or the noise injection rates,and different noise types.
What problem does this paper attempt to address?