Abstract:Abstract Software defects are a critical issue in software development that can lead to system failures and cause significant financial losses. Predicting software defects is a vital aspect of ensuring software quality. This can significantly impact both saving time and reducing the overall cost of software testing. During the software defect prediction (SDP) process, automated tools attempt to predict defects in the source codes based on software metrics. Several SDP models have been proposed to identify and prevent defects before they occur. In recent years, recurrent neural network (RNN) techniques have gained attention for their ability to handle sequential data and learn complex patterns. Still, these techniques are not always suitable for predicting software defects due to the problem of imbalanced data. To deal with this problem, this study aims to combine a bidirectional long short-term memory (Bi-LSTM) network with oversampling techniques. To establish the effectiveness and efficiency of the proposed model, the experiments have been conducted on benchmark datasets obtained from the PROMISE repository. The experimental results have been compared and evaluated in terms of accuracy, precision, recall, f-measure, Matthew’s correlation coefficient (MCC), the area under the ROC curve (AUC), the area under the precision-recall curve (AUCPR) and mean square error (MSE). The average accuracy of the proposed model on the original and balanced datasets (using random oversampling and SMOTE) was 88%, 94%, And 92%, respectively. The results showed that the proposed Bi-LSTM on the balanced datasets (using random oversampling and SMOTE) improves the average accuracy by 6 and 4% compared to the original datasets. The average F-measure of the proposed model on the original and balanced datasets (using random oversampling and SMOTE) were 51%, 94%, And 92%, respectively. The results showed that the proposed Bi-LSTM on the balanced datasets (using random oversampling and SMOTE) improves the average F-measure by 43 and 41% compared to the original datasets. The experimental results demonstrated that combining the Bi-LSTM network with oversampling techniques positively affects defect prediction performance in datasets with imbalanced class distributions.

Software Defect Prediction Method Based on Hybrid Sampling

A Hybrid Sampling and Multi-Objective Optimization Approach for Enhanced Software Defect Prediction

An Improved Semi-Supervised Learning Method for Software Defect Prediction.

Sample-based Software Defect Prediction with Active and Semi-Supervised Learning.

Software Defect Prediction Based on Hybrid Swarm Intelligence and Deep Learning

SHSE: A subspace hybrid sampling ensemble method for software defect number prediction

Combined Classifier for Cross-Project Defect Prediction: an Extended Empirical Study.

Adaptive Centre-Weighted Oversampling for Class Imbalance in Software Defect Prediction

A New Improved Prediction of Software Defects Using Machine Learning-based Boosting Techniques with NASA Dataset

Support Vector based Oversampling Technique for Handling Class Imbalance in Software Defect Prediction

A Software Defect Prediction Method That Simultaneously Addresses Class Overlap and Noise Issues after Oversampling

Tackling Class Imbalance Problem In Software Defect Prediction Through Cluster-Based Over-Sampling With Filtering

Diversity based multi-cluster over sampling approach to alleviate the class imbalance problem in software defect prediction

Software defect prediction using hybrid techniques: a systematic literature review

Genetic algorithm-based oversampling approach to prune the class imbalance issue in software defect prediction

A hybrid‐ensemble model for software defect prediction for balanced and imbalanced datasets using AI‐based techniques with feature preservation: SMERKP‐XGB

Software defect prediction using a bidirectional LSTM network combined with oversampling techniques

Software defect prediction based on nested-stacking and heterogeneous feature selection

Hybrid Optimization-Based Neural Network Classifier for Software Defect Prediction

Performance evaluation of software defect prediction with NASA dataset using machine learning techniques

Software defect prediction model based on improved twin support vector machines