Abstract:To facilitate developers in effective allocation of their testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on the past history of classes, methods, or certain other code elements. These techniques are effective provided that a sufficient amount of data is available to train a prediction model. However, sufficient training data are rarely available for new software projects. To resolve this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, was proposed and is regarded as a new challenge in the area of defect prediction. Thus far, only a few cross-project defect prediction techniques have been proposed. To advance the state of the art, in this study, we investigated seven composite algorithms that integrate multiple machine learning classifiers to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we performed experiments on 10 open-source software systems from the PROMISE repository, which contain a total of 5,305 instances labeled as defective or clean. We compared the composite algorithms with the combined defect predictor where logistic regression is used as the meta classification algorithm (CODEP (Logistic) ), which is the most recent cross-project defect prediction algorithm in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experimental results show that several algorithms outperform CODEP (Logistic) : Maximum voting shows the best performance in terms of F-measure and its average F-measure is superior to that of CODEP (Logistic) by 36.88%. Bootstrap aggregation (Bagging (J48)) shows the best performance in terms of cost effectiveness and its average cost effectiveness is superior to that of CODEP (Logistic) by 15.34%.

An Empirical Study of Execution-Data Classification Based on Machine Learning.

Towards more accurate multi-label software behavior learning

An Experience-Based Approach for Test Execution Effort Estimation

An Empirical Study on Software Failure Classification with Multi-Label and Problem-Transformation Techniques

Empirical Analysis of Financial Statement Fraud of Listed Companies Based on Logistic Regression and Random Forest Algorithm

Combined Classifier for Cross-Project Defect Prediction: an Extended Empirical Study.

Performance assessment and fitness analysis of athletes using decision tree and data mining techniques

Is this a bug or an obsolete test?

Application of Random Trees Model in Online Learning Perspective in Evaluating Learners’ Behavioral Engagement

Detection Software Content Failures Using Dynamic Execution Information

Using black-box performance models to detect performance regressions under varying workloads: an empirical study

New machine learning algorithm: random forest

Building Emerging Pattern (EP) Random Forest for Recognition

A Comprehensive Study on Learning-Based PE Malware Family Classification Methods

Extreme random forest method for machine fault classification

An empirical study of classification algorithm evaluation for financial risk prediction

An empirical study of data sampling techniques for just-in-time software defect prediction

A comparative study of automated legal text classification using random forests and deep learning

Predicting student performance using data from an auto-grading system

Systematic Comparison of Power Line Classification Methods from ALS and MLS Point Cloud Data

Research on Defect Detection Technology of Trusted Behavior Decision Tree Based on Intelligent Data Semantic Analysis of Massive Data