Performance comparison of Machine learning classifiers in Software Defects Prediction
Naeem Hussain,Li Bixin,YuanXiaodong,IsrarHussain
2020-01-01
Abstract:Background: In software development life cycle, software testing is the main stage which can minimize the defects of software. A domain which has receiving much attention of software researchers since past couple of years is software defects prediction (SDP). Its aim to minimize the cost, time and improve the efficiency of software. The main aim of this research is to show a comparative analysis of software defect prediction based on support vector machine SVM and extreme learning machine ELM. In this domain defect prediction models were created using three different prediction techniques based on test data and training data. i.e. crossvalidation prediction, cross-version prediction and cross-project prediction. In this study we used cross version prediction approach, data from old version of a software is used as training data to develop the prediction model and the model is evaluated from same project of current version. Materials and Methods: In our studies, we consider three different versions of eclipse version control system then we had split the data into training and tested sets. We choose different object oriented metrics and algorithm to build our model, aiming to predict software defects in different versions. For training purpose of our model we used SVM and ELM. To validate our prediction models, we can calculate the performance of prediction model using some popular used measurement scales such as accuracy, precision, recall, AUC (Area under ROC curve). Results: By comparing the file based results of SVM and ELM we can find the average accuracy values and AUC. This means the extreme learning machine has the highest AUC value, but the value of accuracy is also close to SVM. And SVM have similar accuracy, and very close AUC value. Then we can see how these models perform in package based prediction. By comparing the data in package based prediction of SVM and ELM, the accuracy and AUC values shows thatSVM has best accuracy, but the value of AUC decreases apparently. So we can conclude that SVM has best prediction results in file based defects.The results demonstrate that support vector machine is best fit for the cross version defect prediction. Conclusion: Software testing has become more and more important in software reliability since last couple of years. But on software testing we are wasting much time, resource and money. Software defect prediction can help to improve the efficiency of software testing and guide the direct resource allocation. In this study, we discussed the key techniques including software metrics, classifiers, and defect prediction models and its evaluation.Python language is most widely use language especially in data science. The significant factor giving the push for Python is the variety of data science/data analytics libraries made available for the aspirants. Pandas, NumPy, SciPy, and Scikit-Learn, are some of the libraries well known in the data science community. Python does not stop with that as libraries have been growing over time. When it comes to data science, machine learning is one of the significant elements used to maximize value from data.