Minimum Storage BASIC Codes: A System Perspective

Xianxia Huang,Hui Li,Tai Zhou,Yumeng Zhang,Han Guo,Hanxu Hou,Huayu Zhang,Kai Lei
DOI: https://doi.org/10.1109/bigdata.2013.6691660
2013-01-01
Abstract:The explosion of big data stored in distributed file systems calls for more efficient storage paradigms. While replication is widely used to ensure data availability, erasure codes provide a much better tradeoff between storage and availability. Reed-Solomon (RS) codes are the standard design choice, however, their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability. BASIC codes can achieve the optimal tradeoff between storage capacity and repair bandwidth with much less complexity of regenerating codes, which is first proposed in [1]. This paper integrate one construction of the minimum storage BASIC (MS-BASIC) codes [2] into a Hadoop HDFS cluster testbed with up to 22 storage nodes. We demonstrate that MS-BASIC codes conform to the theoretical findings and achieve recovery bandwidth saving compared to the conventional recovery approach based on RS codes.
What problem does this paper attempt to address?