Deep Learning Based Continuous Integration and Continuous Delivery Software Defect Prediction with Effective Optimization Strategy

Anurag Mishra,Ashish Sharma
DOI: https://doi.org/10.1016/j.knosys.2024.111835
IF: 8.139
2024-04-20
Knowledge-Based Systems
Abstract:Software defect prediction is one of the most difficult tasks in the IT sector. Continuous Integration and Continuous Delivery (CI/CD) software defect prediction is used in earlier stage, which consumes less amount of time. Various learning algorithms, such as convolutional neural networks and other machine learning algorithms (ML), are employed to forecast the flaws in the software model. In these existing algorithms, some issues are noticed, such as high computational complexity, excessive time consumption, the need for more energy to predict the model and a high-loss function. To address these issues, a novel deep learning (DL)-based CI/CD software defect prediction technique applying an effective optimization strategy is provided to improve the model's efficiency. The numerical data are collected from open source, and the obtained data is initially labeled based on time domain limitations. Initially, the Modified Synthetic Minority Over-Sampling Technique (M-SMOTE) mechanism is applied to balance the data to avoid overfitting problems, and data normalization is performed to rescale and normalize the data properly. After moralization, the optimal set of features is extracted from the data using Focal Bidirectional Encoder Representations from Transformers (F-BERT) to enhance the efficiency of the model. Finally, the software defects are predicted using Bidirectional Long Short-Term Memory (Bi-LSTM) integrated into the convolutional Gated recurrent units (GRU) model (Bi-CGRU) based on collected features. Hybrid Levy Rao (HLR) optimization is used to tune the hyperparameters properly in the classifier model. The proposed model's performance indicators are examined. The proposed model has a 95.32% accuracy, a 93.3% recall, a 94.98% Matthews correlation coefficient, and an F1-score of 91.355%. The proposed model generates less labeling noise and wait time than existing methods.
computer science, artificial intelligence
What problem does this paper attempt to address?