Deduplication Model Based on File-Similarity Clustering

WANG Can,QIN Zhi-guang,WANG Juan,CAI Bo
DOI: https://doi.org/10.3969/j.issn.1001-3695.2012.05.022
2012-01-01
Abstract:To resolve the locality dependence and multiple-nodes dependence problems of the current throughput improving methods for deduplication system,this paper proposed a deduplication model based on file-similarity clustering.This model expanded the traditional flat index structure into spatial structure.According to the Broder's theorem,it kept only a handful of the most representative indices in RAM.It partitioned the index horizontally and distributed on several totally autonomous storage nodes.The experimental results indicate that the model can effectively improve the deduplication performance and the throughput on average in the large scale cloud-storage environment,and the data loads are balanced.Therefore,the model can be extended smoothly.
What problem does this paper attempt to address?