Tree-Based Model with Advanced Data Preprocessing for Large Scale Hard Disk Failure Prediction

Qi Wu,Weilong Chen,Wei Bao,Jipeng Li,Peikai Pan,Qiyao Peng,Pengfei Jiao
DOI: https://doi.org/10.1007/978-981-15-7749-9_9
2020-01-01
Abstract:As the scale of data in data centers expands, the hard drives are widely used in computer. However, hard disk failures occur frequently in actual scenarios. With the increase of utilizing time, the stability and accuracy of hard disk are continuously decreasing, and will result in negative impact on normal operation of the system. However, there are no researches on the estimation of hard disk quality in entire industry. In this article, we utilize Generative Adversarial Networks (GAN) for realizing data augmentation, and use the catboost model to model the prediction of disk damage, which achieved tenth place in the PAKDD2020 Alibaba intelligent operation and maintenance algorithm competition-large-scale hard disk failure prediction competition [1].
What problem does this paper attempt to address?