ADF2T: an Active Disk Failure Forecasting and Tolerance Software

Hongzhang Yang,Yahui Yang,Zhengguang Chen,Zongzhao Li,Yaofeng Tu
DOI: https://doi.org/10.1109/issrew51248.2020.00030
2020-01-01
Abstract:The reliability of distributed file system is inevitably affected by hard disk failure. This paper proposes an active disk failure forecasting and tolerance software. Firstly, multiple SMART records in the time window are merged into one sample, and after sliding, tens of times of positive samples are created. Secondly, the features are selected by two-stage sorting method, so that the most conducive features are used in machine learning modeling, and the time for model training can be shortened obviously. Thirdly, through two-stage verification, parameters can be adjusted in time for unreasonable proactive reconstruction strategies. Experiments show that modeling and forecast of ZTE data set and Backblaze data set respectively, the recall rate is 95.66% and 84.28%, and the error rate is 0.23% and 2.45%. The work in this paper has been commercially used for more than one year in ZTE data center. The reliability of distributed file system software is significantly improved.
What problem does this paper attempt to address?