Software defect prediction ensemble learning algorithm based on 2-step sparrow optimizing extreme learning machine

Yu Tang,Qi Dai,Mengyuan Yang,Lifang Chen,Ye Du
DOI: https://doi.org/10.1007/s10586-024-04446-y
2024-05-19
Cluster Computing
Abstract:Software defect prediction is a crucial discipline within the software development life cycle. Accurate identification of defective modules in software can result in time and cost savings for developers. The ELM algorithm offers the benefits of rapid training and robust learning capabilities. Numerous researchers in the field of software defect prediction have employed the ELM algorithm. However, the ELM algorithm, a single hidden layer feedforward neural network, faces challenges related to random parameter selection and limited generalization ability. To enhance the predictive performance of the ELM algorithm in software defect prediction. Most researchers utilize swarm intelligence optimization algorithms to optimize extreme learning machines. However, these optimization methods may encounter challenges related to fall into local optimal solution. This paper introduces a new sparrow search algorithm (2SSSA) built upon the original sparrow search algorithm. To enhance the original sparrow algorithm's ability to escape local extrema, the pinhole imaging reverse learning and somersault foraging strategies are employed. The performance of 2SSSA in terms of optimization and convergence speed is assessed using 8 randomly selected benchmark functions and 8 CEC2017 functions. Additionally, ensemble learning is a prominent research focus in the field of software defect prediction. Ensemble learning is known for its ability to significantly enhance prediction performance and model generalization. As a result, the ELM optimized using 2SSSA serves as the foundational predictor in the bagging ensemble learning algorithm. We propose an ensemble algorithm for software defect prediction, denoted as 2SSEBA, which employs a 2-step optimization sparrow algorithm (2SSSA) to optimize extreme learning machines. Based on an evaluation of 25 publicly available software defect prediction datasets using 5 commonly employed metrics. The predictive performance of 2SSEBA significantly outperforms the other five advanced prediction algorithms. Furthermore, this conclusion is supported by both Friedman ranking and Holm's post-hoc test.
computer science, information systems, theory & methods
What problem does this paper attempt to address?