Abstract:To facilitate developers in effective allocation of their testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on the past history of classes, methods, or certain other code elements. These techniques are effective provided that a sufficient amount of data is available to train a prediction model. However, sufficient training data are rarely available for new software projects. To resolve this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, was proposed and is regarded as a new challenge in the area of defect prediction. Thus far, only a few cross-project defect prediction techniques have been proposed. To advance the state of the art, in this study, we investigated seven composite algorithms that integrate multiple machine learning classifiers to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we performed experiments on 10 open-source software systems from the PROMISE repository, which contain a total of 5,305 instances labeled as defective or clean. We compared the composite algorithms with the combined defect predictor where logistic regression is used as the meta classification algorithm (CODEP (Logistic) ), which is the most recent cross-project defect prediction algorithm in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experimental results show that several algorithms outperform CODEP (Logistic) : Maximum voting shows the best performance in terms of F-measure and its average F-measure is superior to that of CODEP (Logistic) by 36.88%. Bootstrap aggregation (Bagging (J48)) shows the best performance in terms of cost effectiveness and its average cost effectiveness is superior to that of CODEP (Logistic) by 15.34%.

Line-Level Defect Prediction by Capturing Code Contexts with Graph Convolutional Networks

Unifying Defect Prediction, Categorization, and Repair by Multi-Task Deep Learning

BAFLineDP: Code Bilinear Attention Fusion Framework for Line-Level Defect Prediction

LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction

SyntaxLineDP: a Line-level Software Defect Prediction Model Based on Extended Syntax Information

Defect Prediction With Semantics and Context Features of Codes Based on Graph Representation Learning

Software Defect Prediction Based on Deep Representation Learning of Source Code From Contextual Syntax and Semantic Graph

Software visualization and deep transfer learning for effective software defect prediction

Deep Semantic Feature Learning for Software Defect Prediction

Deep Learning for Just-In-Time Defect Prediction

Predicting Line-Level Defects by Capturing Code Contexts with Hierarchical Transformers

Software Defect Prediction Based on Gated Hierarchical LSTMs

Combined Classifier for Cross-Project Defect Prediction: an Extended Empirical Study.

JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction

Deep learning or classical machine learning? An empirical study on line‐level software defect prediction

An Approach to Semantic and Structural Features Learning for Software Defect Prediction

Software Defect Prediction and Localization with Attention-Based Models and Ensemble Learning

Just‐in‐time Defect Prediction Enhanced by the Joint Method of Line Label Fusion and File Filtering

Seml: A Semantic LSTM Model for Software Defect Prediction

Multi‐graph Learning‐based Software Defect Location

Fine-Grained Software Defect Prediction Based on the Method-Call Sequence