STORE: Data recovery with approximate minimum network bandwidth and disk I/O in distributed storage systems

Zhou Tai,Li Hui,Zhu Bing,Zhang Yumeng,Hou Hanxu,Chen Jun
DOI: https://doi.org/10.1109/BigData.2014.7004381
2014-01-01
Abstract:Recently, traditional erasure codes such as Reed-Solomon (RS) codes have been increasingly deployed in many distributed storage systems to reduce the large storage overhead incurred by the widely adopted replication scheme. However, these codes require significantly high resources with respect to network bandwidth and disk I/O during recovery of missing or unavailable data. It is referred as the recovery problem. In this paper, we dedicate to integrating exact minimum bandwidth regenerating codes into practical systems to solve the recovery problem. We design an implementation friendly storage code with the recently proposed BASIC framework and ZigZag decodable code for saving recovery bandwidth and disk I/O. We build a system called STORE based on this code and evaluate our prototype atop a HDFS cluster testbed with 21 nodes. As shown in this paper, the recovery bandwidth achieves minimum approximately during recovery of both data block and parity block with STORE. Another attractive result is that the recovery disk I/O also achieves minimum approximately during recovery of data block. Due to the reduction of recovery bandwidth and disk I/O, the degraded read throughput is boosted notably. © 2014 IEEE.
What problem does this paper attempt to address?