DA Placement: A Dual-Aware Data Placement in a Deduplicated and Erasure-Coded Storage System

Mingzhu Deng,Ming Zhao,Fang Liu,Zhiguang Chen,Nong Xiao
DOI: https://doi.org/10.1007/978-3-030-05051-1_25
2018-01-01
Abstract:Simultaneously incorporating deduplication as well as erasure coding is preferred for modern storage systems for the enhanced storage efficiency and economical data reliability. However, simple incorporation suffers from the "read imbalance problem", in which parallel data accesses are curbed by throttled storage nodes. This problem is due to the uneven data placement in the system, which is unaware of the employment of both deduplication and erasure coding, each of whom alters the order of data if unattended. This paper proposes a systematic design and implementation of a Dual-Aware(DA) placement in a combined storage system to achieve both deduplication-awareness and erasure-coding-awareness at the same time. DA not only records the node number of each unique data to allow for quick references with ease, but also dynamically tracks used nodes for each writes request. In this way, deduplication awareness is formed to skip inconvenient placement locations. Besides, DA serializes the placement of parity blocks with a stripe and across stripes. Such realization of erasure coding awareness ensures the separation of data and parity, as well as maintains data sequentiality at bordering stripes. Additionally, DA manages to extend with further load-balancing through an innovative use of the deduplication level, which intuitively predicts future accesses of a piece of data. In short, DA manages to boost system performance with little memory or computation cost. Extensive experiments using both real-world traces and synthesized workloads, prove DA achieves a better read performance. For example, DA respectively leads an average latency margin of 30.86% and 29.63%, over the baseline rolling placement(BA) and random placement(RA) under CAFTL traces over a default cluster of 12 nodes with RS(8,4).
What problem does this paper attempt to address?