MGRM: A Multi-Segment Greedy Rewriting Method to Alleviate Data Fragmentation in Deduplication-Based Cloud Backup Systems
Datong Zhang,Yuhui Deng,Yi Zhou,Jie Li,Weiheng Zhu,Geyong Min
DOI: https://doi.org/10.1109/tcc.2022.3214816
IF: 5.697
2022-01-01
IEEE Transactions on Cloud Computing
Abstract:Data deduplication has been broadly used in Cloud due to its storage space saving ability. Capping methods that rewrite the data chunks of low Container Reference Ratio (CRR) containers are developed to alleviate the data fragmentation in Cloud. We analyze and observe from real traces that a number of segments only point to low CRR containers, while some others only contain high CRR containers. This interesting observation is ignored by the existing capping methods. To address this problem, we propose a multi-segment greedy rewriting method named MGRM. MGRM sorts containers of segments in a sequential way. More specifically, given the $i$ith segment currently being processed, MGRM will sort all the containers in the top $i$ith segments. This salient searching feature enables MGRM to select and rewrite the true low-reference container set. Moreover, to achieve a good balance between deduplication ratio and restore performance, MGRM has two working modes: an optimal rewriting mode and a radical rewriting mode. When working in the optimal rewriting mode, MGRM aims to improve the deduplication ratio; when the radical rewriting mode, MGRM strives to improve the restore performance. MGRM adaptively switches the working mode according to workload. Furthermore, unlike the existing capping methods that improve restore performance at the cost of the deduplication ratio, MGRM pays attention to both aspects. Our extensive experimental results show that MGRM achieves high restore performance, coupled with a high deduplication ratio. In particular, compared with the two state-of-art schemes FC and FLC, MGRM improves the deduplication ratio and restore performance by up to 114.83% and 99.34%, respectively.
computer science, information systems, theory & methods