Temporal-Contextual Attention Network for Solid-State Drive Failure Prediction in Data Centers

Chankyu Koh,Jisu Kang,Taehyeong Kim,Sung Won Han
DOI: https://doi.org/10.1109/access.2024.3482368
IF: 3.9
2024-10-29
IEEE Access
Abstract:Proactive strategies for predicting solid state drive(SSD) failures are imperative to ensure uninterrupted services in data centers. Traditional methods that rely on rule-based approaches and machine learning algorithms often fail to accurately predict these failures. In this study, we introduce a temporal-contextual attention network(TCAN), a pioneering method that integrates long short-term memory(LSTM) and transformer architectures to address the limitations of existing schemes. TCAN exploits temporal patterns and attribute dependencies and offers a more comprehensive solution for SSD failure prediction. Unlike conventional feature selection methods, TCAN adopts a feature grouping approach that utilizes all available attributes while accounting for its unique characteristics. Specifically, TCAN treats certain features based on their temporal aspects, capturing how these features change over time, whereas other features are used to capture the dependencies and interactions among different attributes. Through extensive evaluation of private datasets from the Tencent data center and comparison with state-of-the-art models, including machine learning and deep learning approaches, TCAN demonstrates superior performance in identifying potential failures. Furthermore, ablation studies and evaluations using public datasets validate their effectiveness and robustness across different datasets. Our findings underscore the importance of considering both temporal features and inter-attribute dependencies for accurate SSD failure prediction, highlighting the potential of TCAN for enhancing storage system reliability and service stability in data center environments.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?