Finesse: Fine-Grained Feature Locality Based Fast Resemblance Detection For Post-Deduplication Delta Compression

Yucheng Zhang,Wen Xia,Dan Feng,Hong Jiang,Yu Hua,Qiang Wang
2019-01-01
Abstract:In storage systems, delta compression is often used as a complementary data reduction technique for data deduplication because it is able to eliminate redundancy among the non-duplicate but highly similar chunks. Currently, what we call `N-transform Super-Feature' (N-transform SF) is the most popular and widely used approach to computing data similarity for detecting delta compression candidates. But our observations suggest that the N-transform SF is compute-intensive: it needs to linearly transform each Rabin fingerprint of the data chunks N times to obtain N features, and can be simplified by exploiting the fine-grained feature locality existing among highly similar chunks to eliminate time-consuming linear transformations. Therefore, we propose Finesse, a fine-grained feature-locality-based fast resemblance detection approach that divides each chunk into several fixed-sized subchunks, computes features from these subchunks individually, and then groups the features into super-features. Experimental results show that, compared with the state-of-the-art N-transform SF approach, Finesse accelerates the similarity computation for resemblance detection by 3.2x similar to 3.5x and increases the final throughput of a deduplicated and delta compressed prototype system by 41%similar to 85%, while achieving comparable compression ratios.
What problem does this paper attempt to address?