A Novel Data Routing Strategy Based on Directories for Deduplication Clusters

Lifang Wang,Zhike Zhang,Zejun Jiang,Xiaobin Cai,Chengzhang Peng
DOI: https://doi.org/10.3969/j.issn.1000-2758.2014.04.038
2014-01-01
Abstract:Deduplication cluster is an effective way for meeting the increasing and massive data backup require-ments. Its key problem is how to distribute the data to nodes in the deduplication cluster; this is the data routing strategy. Existing data routing strategy utilizes the MCS ( Minimum Chunk Signature) of a file or data segment to compute the target routing node. When the size of the deduplication cluster is small, the storage utilization of MCS approaches the single node deduplication. However, when the deduplication cluster is in large scale, its storage uti-lization is much lower than the single node deduplication. We propose a novel data routing strategy using directories for the deduplication cluster for decreasing the storage utilization of the deduplication cluster,;this new strategy we call DRSD( Data Routing Strategy Based on Directories) . Experimental results and their analysis show preliminarily that, for various numbers of the nodes of the deduplication cluster, the deduplication ratios obtained with DRSD are much better than those obtained with MCS, and even approach those obtained with single node deduplication. When the number of nodes is 64, the deduplication ratio obtained with DRSD is 35% better than that obtained with MCS.
What problem does this paper attempt to address?