Abstract:Predicting when and where bugs will appear in software may assist improve quality and save on software testing expenses. Predicting bugs in individual modules of software by utilizing machine learning methods. There are, however, two major problems with the software defect prediction dataset: Social stratification (there are many fewer faulty modules than non-defective ones), and noisy characteristics (a result of irrelevant features) that make accurate predictions difficult. The performance of the machine learning model will suffer greatly if these two issues arise. Overfitting will occur, and biassed classification findings will be the end consequence. In this research, we suggest using machine learning approaches to enhance the usefulness of the CatBoost and Gradient Boost classifiers while predicting software flaws. Both the Random Over Sampler and Mutual info classification methods address the class imbalance and feature selection issues inherent in software fault prediction. Eleven datasets from NASA's data repository, "Promise," were utilised in this study. Using 10-fold cross-validation, we classified these 11 datasets and found that our suggested technique outperformed the baseline by a significant margin. The proposed methods have been evaluated based on their abilities to anticipate software defects using the most important indices available: Accuracy, Precision, Recall, F1 score, ROC values, RMSE, MSE, and MAE parameters. For all 11 datasets evaluated, the suggested methods outperform baseline classifiers by a significant margin. We tested our model to other methods of flaw identification and found that it outperformed them all. The computational detection rate of the suggested model is higher than that of conventional models, as shown by the experiments..

Research of software fault prediction based on PU learning

An Improved Semi-Supervised Learning Method for Software Defect Prediction.

An unsupervised defect prediction method based on probability

UDA-DP: Unsupervised Domain Adaptation for Software Defect Prediction

A systematic review of unsupervised learning techniques for software defect prediction

Sample-based Software Defect Prediction with Active and Semi-Supervised Learning.

Positive-Unlabeled Learning-Based Hybrid Deep Network for Intelligent Fault Detection

A Novel Imbalanced Data Classification Method Based on Weakly Supervised Learning for Fault Diagnosis

A Two-Stage Data Preprocessing Approach for Software Fault Prediction

Data-Based Line Trip Fault Prediction in Power Systems Using LSTM Networks and SVM.

Split-PU: Hardness-aware Training Strategy for Positive-Unlabeled Learning

High-fidelity Positive-Unlabeled Deep Learning for Semi-Supervised Fault Detection of Chemical Processes

A Hybrid Sampling and Multi-Objective Optimization Approach for Enhanced Software Defect Prediction

A New Improved Prediction of Software Defects Using Machine Learning-based Boosting Techniques with NASA Dataset

Use of Deep Learning Model with Attention Mechanism for Software Fault Prediction

Unsupervised Real Time Prediction of Faults Using the Support Vector Machine

PSPU: Enhanced Positive and Unlabeled Learning by Leveraging Pseudo Supervision

Investigating Associative Classification for Software Fault Prediction: an Experimental Perspective.

Semi-Supervised Text Classification Using Positive and Unlabeled Data

Cross-Project and Within-Project Semi-Supervised Software Defect Prediction Problems Study Using a Unified Solution

Predicting the Number of Software Faults using Deep Learning