ESetStore: An Erasure-Coded Storage System With Fast Data Recovery

Chengjian Liu,Qiang Wang,Xiaowen Chu,Yiu-Wing Leung,Hai Liu
DOI: https://doi.org/10.1109/tpds.2020.2983411
IF: 5.3
2020-09-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Erasure codes have been used extensively in large-scale storage systems to reduce the storage overhead of triplication-based storage systems. One key performance issue introduced by erasure codes is the long time needed to recover from a single failure, which occurs constantly in large-scale storage systems. We present ESetStore, a prototype erasure-coded storage system that aims to achieve fast recovery from failures. ESetStore is novel in the following aspects. We proposed a data placement algorithm named ESet for our ESetStore that can aggregate adequate I/O resources from available storage servers to recover from each single failure. We designed and implemented efficient read and write operations on our erasure-coded storage system via effective use of available I/O and computation resources. We evaluated the performance of ESetStore with extensive experiments on a cluster with 50 storage servers. The evaluation results demonstrate that our recovery performance can obtain linear performance growth by harvesting available I/O resources. With our defined parameter recovery I/O parallelism under some mild conditions, we can achieve optimal recovery performance, in which ESet enables minimal recovery time. Rather than being an alternative to improve recovery performance, our work can be an enhancement for existing solutions, such as Partial-parallel-repair (PPR), to further improve recovery performance.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?