CaCo: an Efficient Cauchy Coding Approach for Cloud Storage Systems
Guangyan Zhang,Guiyong Wu,Shupeng Wang,Jiwu Shu,Weimin Zheng,Keqin Li
DOI: https://doi.org/10.1109/tc.2015.2428701
2016-01-01
Abstract:Users of cloud storage usually assign different redundancy configurations (i.e., (k, m, w) of erasure codes, depending on the desired balance between performance and fault tolerance. Our study finds that with very low probability, one coding scheme chosen by rules of thumb, for a given redundancy configuration, performs best. In this paper, we propose CaCo, an efficient Cauchy coding approach for data storage in the cloud. First, CaCo uses Cauchy matrix heuristics to produce a matrix set. Second, for each matrix in this set, CaCo uses XOR schedule heuristics to generate a series of schedules. Finally, CaCo selects the shortest one from all the produced schedules. In such a way, CaCo has the ability to identify an optimal coding scheme, within the capability of the current state of the art, for an arbitrary given redundancy configuration. By leverage of CaCo's nature of ease to parallelize, we boost significantly the performance of the selection process with abundant computational resources in the cloud. We implement CaCo in the Hadoop distributed file system and evaluate its performance by comparing with "Hadoop-EC" developed by Microsoft research. Our experimental results indicate that CaCo can obtain an optimal coding scheme within acceptable time. Furthermore, CaCo outperforms Hadoop-EC by 26.68-40.18 percent in the encoding time and by 38.4-52.83 percent in the decoding time simultaneously.