FeSCH: A Feature Selection Method Using Clusters of Hybrid-data for Cross-Project Defect Prediction.

Chao Ni,Wangshu Liu,Qing Gu,Xiang Chen,Daoxu Chen
DOI: https://doi.org/10.1109/compsac.2017.127
2017-01-01
Abstract:Cross project defect prediction (CPDP) is a challenging task since the predictor built on the source projects can hardly generalize well to the target project. Previous studies have shown that both feature mapping and feature selection can alleviate the differences between the source and target projects. In this paper, we propose a novel method FeSCH (Feature Selection using Clusters of Hybrid-data). In particular it includes two phases. The first is the feature clustering phase, which uses a density-based clustering method DPC to group highly co-related features into clusters. The second is the feature selection phase, which selects beneficial features from each cluster. We design three ranking strategies to choose appropriate features. During the empirical studies, we design experiments based on real-world software projects, and evaluate the prediction performance of FeSCH by analyzing the influence of ranking strategies. The experimental results show that FeSCH can outperform three baseline methods (i.e., WPDP, ALL, and TCA+) in most cases, and its performance is independent of the used classifiers.
What problem does this paper attempt to address?