A Fast Duplicate Chunk Identifying Method Based on Hierarchical Indexing Structure

Can Wang,Zhi-Guang Qin,Lei Yang,Juan Wang
DOI: https://doi.org/10.1109/ICICEE.2012.169
2012-01-01
Abstract:To solve the disk bottleneck problem of deduplication system without depending on the data locality, a fast duplicate chunk identifying method based on hierarchical indexing structure is proposed. In this method, the traditional flat indexing structure is vertically divided into two layers, and only a handful of the most representative indices selected according to the Broder's theorem are kept in the RAM. The experiment results on real data, which are lack of locality, indicate that the deduplication performance of this method can reach 87.05% of the optimal value with a far less RAM requirement than the current methods.
What problem does this paper attempt to address?