SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers

Lei Xu,Jian Hu,Stephen Mkandawire,Hong Jiang
DOI: https://doi.org/10.1109/icdcsw.2011.31
2011-01-01
Abstract:Data deduplication techniques are ideal solutions for reducing both bandwidth and storage space requirements for cloud backup services in data centers. Current data deduplication solutions rely on comparing fingerprints (hash values) of data chunks to identify redundant data and store the fingerprints on a centralized server. This approach limits the overall throughput and concurrency performance in large scale systems. Furthermore, the slow seek time associated with hard disks degrades the performance of hash lookup operations which are mainly random I/Os. In this paper we present a scalable hybrid hash cluster (SHHC) to maintain a low-latency distributed hash table for storing data fingerprints. Each hybrid node in the cluster is composed of RAM and Solid State Drives (SSD) to take advantage of the fast random access inherent in SSDs. This distributed approach makes the system scalable, balances the load on the hash store and significantly reduces the latency of the hash lookup process.
What problem does this paper attempt to address?