Abstract:To facilitate developers in effective allocation of their testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on the past history of classes, methods, or certain other code elements. These techniques are effective provided that a sufficient amount of data is available to train a prediction model. However, sufficient training data are rarely available for new software projects. To resolve this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, was proposed and is regarded as a new challenge in the area of defect prediction. Thus far, only a few cross-project defect prediction techniques have been proposed. To advance the state of the art, in this study, we investigated seven composite algorithms that integrate multiple machine learning classifiers to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we performed experiments on 10 open-source software systems from the PROMISE repository, which contain a total of 5,305 instances labeled as defective or clean. We compared the composite algorithms with the combined defect predictor where logistic regression is used as the meta classification algorithm (CODEP (Logistic) ), which is the most recent cross-project defect prediction algorithm in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experimental results show that several algorithms outperform CODEP (Logistic) : Maximum voting shows the best performance in terms of F-measure and its average F-measure is superior to that of CODEP (Logistic) by 36.88%. Bootstrap aggregation (Bagging (J48)) shows the best performance in terms of cost effectiveness and its average cost effectiveness is superior to that of CODEP (Logistic) by 15.34%.

Boosting-Based k-NN Learning for Software Defect Prediction

A New Improved Prediction of Software Defects Using Machine Learning-based Boosting Techniques with NASA Dataset

Combined Classifier for Cross-Project Defect Prediction: an Extended Empirical Study.

Cost-Sensitive Radial Basis Function Neural Network Classifier for Software Defect Prediction

Software Defect Prediction Approach Based on a Diversity Ensemble Combined With Neural Network

A Novel Multiple Ensemble Learning Models Based on Different Datasets for Software Defect Prediction

Software Defect Prediction via Deep Belief Network

A Novel Class-Imbalance Learning Approach for Both Within-Project and Cross-Project Defect Prediction.

Studying the effectiveness of deep active learning in software defect prediction

Within-Project Defect Prediction

Software Defect Prediction Using an Intelligent Ensemble-Based Model

Multi-project Regression Based Approach for Software Defect Number Prediction

FSDNP:Feature Selection Method for Software Defect Number Prediction

Cascade Generalization-based Classifiers for Software Defect Prediction

Software defect prediction based on nested-stacking and heterogeneous feature selection

Software Defect Prediction Model Using AdaBoost based Random Forest Technique

Hybrid Optimization-Based Neural Network Classifier for Software Defect Prediction

A Software Defect Prediction Method That Simultaneously Addresses Class Overlap and Noise Issues after Oversampling

Software Defect Prediction using Deep Learning by Correlation Clustering of Testing Metrics

Software Defect Prediction Using Dagging Meta-Learner-Based Classifiers

Software Defect Prediction with Bayesian Approaches