SAUD: Semantics-Aware and Utility-Driven Deduplication Framework for Primary Storage.

Yan Tang,Jianwei Yin,Wei Lo
DOI: https://doi.org/10.1109/hpcc-css-icess.2015.226
2015-01-01
Abstract:Data deduplication is an efficient technology to reduce storage cost for cloud storage systems, especially when massive volume of data has become normalcy in this era of Big Data. Primary storage, as the direct interaction layer with service users, has reaped the benefit of deduplication technologies due to its expensive manufacturing cost. However, since primary storage is constantly accessed by users, workloads of primary storage systems are mostly latency-sensitive. Such workload feature makes it challenging to develop both performance and space efficient deduplication schemes for primary storage systems. Existing deduplication schemes on primary storage pay little attention to achieving desirable space saving while restraining the inherent performance penalty to a little extent.In this paper, we propose SAUD, a Semantics-Aware and Utility-Driven deduplication framework to provide high space saving with minor performance penalty for primary storage. SAUD delivers performance-oriented deduplication service by leveraging the file-level semantics of primary storage in a quantitative way. SAUD calculates deduplication priority of files with diverse semantics as deduplicating instructions. Moreover, SAUD operates in a selective-on mode by dynamically regulating the deduplication process based on the real-time workload and system status, further reducing the side-effect on system performance. Comprehensive evaluations show that SAUD outperforms all other comparative schemes on system read performance by an average of 54.6%. SAUD manages to achieve 82.1% of the space efficiency achieved by the most space-orient scheme, read performance of which falls behind that of SAUD by as much as 80.1%.
What problem does this paper attempt to address?