ELM and KELM based software defect prediction using feature selection techniques

Ishani Arora,Anju Saha
DOI: https://doi.org/10.1080/02522667.2019.1637999
2019-07-04
Journal of Information and Optimization Sciences
Abstract:Context: Software defect prediction (SDP) models help in delivering a dependable and a genuine product to the clients. However, the performance of these models is affected by the presence of irrelevant features in the datasets. This problem is addressed by feature selection techniques. Objectives: (1) To determine the performance of feature selection based classification models in the context of software defect prediction, and (2) To determine if the removal of insignificant features makes a significant difference in the performance of the SDP models. Method: SDP models are built using two classifiers – Extreme learning machine (ELM) and Kernel based extreme learning machine (KELM) based on five wrapper and seven filter based feature selection techniques. Experiments are performed using seven datasets from the PROMISE repository. Testing accuracy is used for performance comparison of the feature selection based ELM and KELM defect classification models. Results: (1) ELM based classifiers achieved a higher testing accuracy with wrapper based feature selection methods while KELM classifiers performed better with filter based methods. (2) It is also found that even after eliminating over 85 percent of the attributes from the original software project data, the classification performance of the models is comparable before and after removing the insignificant features in most of the cases and it improved in very few experiments. Conclusion: With respect to the feature selection based defect classification, the performance of ELM and KELM based models is better with wrapper and filter based methods, respectively. Overall, a dimensionally reduced space does not significantly affect the prediction performance of the SDP models. In a way, it is indicated that the feature subsets obtained after removing the insignificant software metrics provide more significance to the output class.
What problem does this paper attempt to address?