Erasure-Coded Hybrid Writes Based on Data Delta

Bing Wei,Qiang Huang,Hui Chen,Chenhao Zhang,Limin Xiao
DOI: https://doi.org/10.1007/s10766-024-00773-0
2024-05-25
International Journal of Parallel Programming
Abstract:Erasure coding is extensively deployed in today's data centers to tackle prevalent failures, because it can offer higher reliability at lower storage overhead than data replication. However, for each small write, erasure-coded storage systems have to perform a partial write to an entire erasure coding group, resulting in a time-consuming write-after-read. This paper presents DABRI, an erasure-coded hybrid write approach based on data delta for fast partial writes. DABRI uses data deltas that are the differences between latest data values and original data values, instead of parity deltas to recover the failed data. The data node sends the latest data instead of the parity delta to parity nodes for each partial write. The original data stored on the data node is read and sent to the parity nodes, only when the data stored on the parity nodes is insufficient to maintain the data reliability. This can bypass the computation of parity deltas and reduce the number of data reads. For a series of n partial writes to the same data, DABRI performs log-based updates for data and parity in the first write, performs in-place data updates and log-based parity updates for the last n -1 writes. In addition, the I/O between data nodes and parity nodes is scheduled for parallel I/O in each partial write. We implement an erasure-coded prototype storage system based on DABRI to perform performance evaluation. Experimental results running the real-world traces show that DABRI can significantly improve the I/O throughput, compared with the state-of-the-arts.
computer science, theory & methods
What problem does this paper attempt to address?