Heterogeneity-aware Data Regeneration in Distributed Storage Systems

Yan Wang,Dongsheng Wei,Xunrui Yin,Xin Wang
DOI: https://doi.org/10.1109/infocom.2014.6848127
2014-01-01
Abstract:Distributed storage systems provide large-scale reliable data storage services by spreading redundancy across a large group of storage nodes. In such big systems, node failures take place on a regular basis. When a node fails or leaves the system, to maintain the same level of redundancy, it is expected to regenerate the redundant data at a replacement node as soon as possible. Previous studies aim to minimize the network traffic in the regeneration process, but in practical networks, where link capacities vary in a wide range, minimizing network traffic does not always mean minimizing regeneration time. Considering the heterogeneous link capacities, Li et al. proposed a tree-structured regeneration scheme, called RCTREE, to bypass the low-capacitated link encountered in direct transmissions. However, we find that RCTREE may rapidly lose data integrity after several regenerations. In this paper, we reconsider the problem of minimizing regeneration time in networks with heterogeneous link capacities. We derive the minimum amount of data to be transmitted through each link to preserve data integrity. We prove that building an optimal regeneration tree is NP-complete and propose a heuristic algorithm for a near-optimal solution. We further introduce a flexible regeneration scheme, which allows providers to generate different amount of coded data. Simulation results show that the flexible tree-structured regeneration scheme can reduce the regeneration time significantly.
What problem does this paper attempt to address?