Dominoes: Speculative Repair in Erasure-Coded Hadoop System.

Xi Yang,Chen Feng,Zhiwei Xu,Xian-He Sun
DOI: https://doi.org/10.1109/hipc.2015.39
2015-01-01
Abstract:Data volume grows dramatically in the era of big data. To save capital cost on storage hardware, datacenters currently prefer using erasure coding rather than simply replication to resist data loss. Erasure coding can provide equivalent three-way fault tolerance to HDFS's default three replication mechanism but degrades data availability for task scheduling. In an erasure-coded system, data reconstruction time will be paid while tasks access the missing blocks during MapReduce job processing. Tasks' accessing corrupt data introduces task stragglers and degrades resource utilization. To overcome these challenges, we propose a novel mechanism, Dominoes, that coordinates lightweight data states checking and job scheduling to hide such recovery penalty during job processing and enhances job throughputs. The experimental results confirm Dominoes' effectiveness and efficiency that improves job throughput by 9% to 9.7% under failure at an overhead of 2.6% for failure-free jobs.
What problem does this paper attempt to address?