Demand-Aware Erasure Coding for Distributed Storage Systems

Jun Li,Baochun Li
DOI: https://doi.org/10.1109/tcc.2018.2885306
IF: 5.697
2021-04-01
IEEE Transactions on Cloud Computing
Abstract:Distributed storage systems provide cloud storage services by storing data on commodity storage servers. Conventionally, data are protected against failures of such commodity servers by replication. Erasure coding consumes less storage overhead than replication to tolerate the same number of failures and thus has been replacing replication in many distributed storage systems. However, with erasure coding, the overhead of reconstructing data from failures also increases significantly. Under the ever-changing workload where data accesses can be highly skewed, it is challenging to deploy erasure coding with appropriate values of parameters to achieve a well trade-off between storage overhead and reconstruction overhead. In this paper, we propose Zebra, a framework that encodes data by their demand into multiple tiers that deploy erasure codes with different values of parameters. Zebra automatically determines the number of such tiers and dynamically assigns erasure codes with optimal values of parameters into corresponding tiers. With Zebra, a flexible trade-off between storage overhead and reconstruction overhead is achieved with multiple tiers. When demand changes, Zebra adjusts itself with a marginal amount of network transfer. We demonstrate that Zebra can work with two representative families of erasure codes in distributed storage systems, Reed-Solomon codes and local reconstruction codes.
computer science, information systems, theory & methods
What problem does this paper attempt to address?