D3: Deterministic Data Distribution for Efficient Data Reconstruction in Erasure-Coded Distributed Storage Systems

Zhipeng Li,Min Lv,Yinlong Xu,Yongkun Li,Liangliang Xu
DOI: https://doi.org/10.1109/ipdps.2019.00064
2019-05-01
Abstract:Due to individual unreliable commodity components, failures are common in large-scale distributed storage systems. Erasure codes are widely deployed in practical storage systems to provide fault tolerance with low storage overhead. However, the commonly used random data placement in storage systems based on erasure codes induces to heavy cross-rack traffic, load imbalance, and random access, which slow down the recovery process upon failures. In this paper, with orthogonal arrays, we define a Deterministic Data Distribution ( D 3 ) of blocks to nodes and racks, and propose an efficient failure recovery approach based on D 3 . D 3 not only uniformly distributes data/parity blocks among storage servers, but also balances the repair traffic among racks and storage servers for failure recovery. Furthermore, D 3 also minimizes the cross-rack repair traffic for data layouts against a single rack failure and provides sequential access for failure recovery. We implement D 3 in Hadoop Distributed File System (HDFS) with a cluster of 28 machines. Our experiments show that D 3 significantly speeds up the failure recovery process compared with random data distribution, e.g., 2.21 times for (6 , 3)-RS code in a system consisting of eight racks and three nodes in each rack.
What problem does this paper attempt to address?