Alternatives for Eliminating Duplicate in Data Storage

Tianming Yang,Jing Zhang,Wei Sun
DOI: https://doi.org/10.2991/iccnce.2013.140
2013-01-01
Abstract:Duplicate Elimination (DE) is a specialized data compression technique for eliminating duplicate copies of repeating data to optimize the use of storage space or bandwidth. The most common form of DE implementation works by dividing files as chunks and comparing chunks of data to detect duplicates. This paper implements a content-based chunking algorithm to improve duplicate elimination over fixed-sized blocking, and evaluates the methods of chunk comparison, that is, compare-by-hash versus compare-by-value. It indicates that compare-by-hash is efficient and feasible even employed in ultra-large-scale storage systems.
What problem does this paper attempt to address?