SWEP-RF: Accuracy Sliding Window-based Ensemble Pruning Method for Latent Sector Error Prediction in Cloud Storage Computing.

Adnan Tahir,Fei Chen,Abdulwahab Ali Almazroi,Nourah Fahad Janbi
DOI: https://doi.org/10.1016/j.jksuci.2023.101672
IF: 9.006
2023-07-30
Journal of King Saud University - Computer and Information Sciences
Abstract:Latent sector errors (LSEs) in disk drives cause significant outages, data loss, and unreliability in large-scale cloud storage systems. Predicting LSEs can help avoid these problems and improve cloud reliability. Ensemble classifiers typically outperform individual classifiers for LSE prediction with high accuracy but can lead to underfitting and incurring additional computational cost, complexity, and time and memory consumption. This research addresses this challenge by proposing a twofold solution: optimizing the ensemble diversity of the resulting Random Forest (RF) classifier through accuracy sliding window-based ensemble pruning (SWEP-RF) and using this pruned ensemble to predict LSEs in cloud storage. SWEP-RF maximizes its lower margin distribution to adapt the RF prediction performance and produce a strong-performing and effective subensemble. Our approach also reduces ensemble size while maintaining high prediction accuracy. We evaluate our algorithm using datasets from Baidu Inc and Backblaze datacenters. Experimental results demonstrate that our approach achieves over 98.6% prediction accuracy, a low false alarm rate (FAR) of 0.003% , and extended meantime to data loss (MTTDL) with lead time in advance (LTA) of up to 383.4 Hrs. and 474.3 Hrs., respectively. SWEP-RF outperforms classical models and current state-of-the-art techniques in prediction accuracy, FAR, MTTDL, processing time, memory consumption, and cloud availability. Our method is a promising solution for enhancing cloud storage reliability through proactive LSE prediction.
computer science, information systems
What problem does this paper attempt to address?