A Hierarchical RAID Architecture Towards Fast Recovery and High Reliability
Yongkun Li,Neng Wang,Chengjin Tian,Si Wu,Yueming Zhang,Yinlong Xu
DOI: https://doi.org/10.1109/tpds.2017.2775231
IF: 5.3
2018-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Disk failures are very common in modern storage systems due to the large number of inexpensive disks. As a result, it takes a long time to recover a failed disk due to its large capacity and limited I/O. To speed up the recovery process and maintain a high system reliability, we propose a hierarchical code architecture with erasure codes, OI-RAID, which consists of two layers of codes, outer layer code and inner layer code. Specifically, the outer layer code is deployed with disk grouping technique based on Balanced Incomplete Block Design (BIBD) or complete graph with skewed data layout to provide efficient parallel I/O of all disks for fast failure recovery, and the inner layer code is deployed within each group of disks to provide high reliability. As an example, we deploy RAID5 in both layers to achieve fault tolerance of at least three disk failures, which meets the requirement of data availability in practical systems, as well as much higher speed up ratio for disk failure recovery than existing approaches. Besides, OI-RAID also keeps the optimal data update complexity and incurs low storage overhead in practice.