Multi_level data pre_processing for software defect prediction

Armah, G.K.,Guangchun Luo,Ke Qin
DOI: https://doi.org/10.1109/ICIII.2013.6703111
2013-01-01
Abstract:Early detection of defective software components enables verification experts give much time and allocate scare resources to the problem areas of the system under development. This is the usefulness of defect prediction; defect prediction streamline testing efforts and reduce the development cost of software when as stated above it is detected at the early stages. An important step to building effective predictive models is to apply one or more sampling techniques. A model is claimed to be effective if it is able to correctly classify defective and non-defective modules as accurately as possible. In this paper we considered the outcome of data preprocessing by filtering and compared the performance with non-pre-processing original dataset. We compared the performance of the four different K-Nearest Neighbor(KNN-LWL, Kstar, IBK, IB1 classifiers) with Non Nested Generalized Exemplars (NNGE), Random Tree and Random Forest. We observed that our Multi-level data pre-processing; which includes double attribute selection and tripartite instance filtering enhanced the defect prediction results. We also observed that these two filtering methods improved performance of the prediction results independently; by using attribute selection only and resampling filtering. The excellent performance achieved could be attributed to the removal of irrelevant attributes by dimension reduction and Resampling also handled the problem of class imbalanced. These together led to the improved performance competences of the classifiers considered. NNGE as its name implies avoided generalization of some of the datasets; those with instances above 2,000; (JM1=10,885 and KC1=2,109) using pre-processing, this may be due to conflicting instances. We also used Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) measures to check the effectiveness of our model.
What problem does this paper attempt to address?