Dynamic Adjustment of Disk Redundancy and Scrubbing Strategy with Reinforcement Learning

Bo Su,Hui Xu,Gang Hu,Jie Shao
DOI: https://doi.org/10.1109/tii.2024.3397398
IF: 12.3
2024-01-01
IEEE Transactions on Industrial Informatics
Abstract:Latent sector errors can cause problems, such as data loss and performance degradation, in a storage system. It is essential to identify failure disks for storage reliability. At the same time, in order to improve storage efficiency, it is necessary to dynamically adjust the redundancy settings (such as erasure coding parameters) and disk scrubbing rate (regularly scan and repair sector errors) according to the health level of different brands or types of disks. However, current work studies these two aspects separately without considering their interactions in a unified system. Moreover, balancing the conflicting objectives of increasing reliability and cost is a challenge. In this work, we propose a deep learning model for health prediction and a reinforcement learning model for strategy adjustment, to solve the above problems. The health prediction part trains a health prediction model for disks of different brands, which outputs the health level of each disk and calculates the overall health levels of different brands. The strategy adjustment part employs a threshold-based reinforcement learning framework Th-DQN to dynamically adjust the redundancy settings and scrubbing rate of disks from this particular brand, in accordance with the brand's overall health level. The experimental results on the datasets of different brands show that our proposed method can select suitable strategies according to different health levels, while satisfying the target reliability requirement and minimizing the system cost.
What problem does this paper attempt to address?