ZTE-Predictor: Disk Failure Prediction System Based on LSTM

Hongzhang Yang,Zongzhao Li,Huiyuan Qiang,Zhongliang Li,Yaofeng Tu,Yahui Yang
DOI: https://doi.org/10.1109/DSN-S50200.2020.00017
2020-01-01
Abstract:Disk failure prediction technology has become a hotspot in both academia and industry, which is of great significance to improve the reliability of data center. This paper studies ZTE's disk SMART (Self-Monitoring Analysis and Reporting Technology) data set, trying to predict whether the disk will fail within 5-7 days. In the model training stage, the disk state is classified as normal and failure within 5 days. Then the positive and negative samples are balanced by both over-sampling and under-sampling. Finally, the data set is trained by LSTM (Long Short-Term Memory) and the disk failure prediction model is obtained. In the experiment of ZTE historical data set, the best FDR (Fault Detection Rate) is 97.4% and FAR (False Alarm Rate) is 0.3%. After launching in ZTE data center for 7 months, the best FDR is 94.5%, and the FAR is 0.7%.
What problem does this paper attempt to address?