Improved Deduplication Method based on Variable-Size Sliding Window

Can Wang -,Zhiguang Qin -,Lei Yang -,Peng Nie -
DOI: https://doi.org/10.4156/jdcta.vol5.issue9.9
2011-01-01
International Journal of Digital Content Technology and its Applications
Abstract:To improve the deduplication performance while keep a reasonable metadata cost and time cost at the same time, a state deduplication method based on variable-size sliding window and a universal model of performance-analyzing for the deduplication methods are proposed. According to this method, the data object is divided into non-overlapped mini chunks based on its content firstly, and then a variable-size sliding window, which uses the mini chunks as its basic unit of movement, is used to identify the duplicate data blocks. Further more, different chunking strategies are used on the data changing regions and the non-changing regions respectively. The theoretical analyzing indicates that this method can achieve satisfying deduplication performance even with a relatively larger expected chunk size, because it can reduce the metadata cost effectively and identify smaller duplicate data blocks. The experimental results on real data show that the average compression ratio of the method can be increased 13.02% than SWC, which has the optimal deduplication performance among the current deduplication methods. Moreover, the average time cost of this method is reduced 97.45% than SWC. This method is suitable for the applications of mass data backup in network environment, which have more rigorous requirements on the deduplication performance and the time cost.
What problem does this paper attempt to address?