MassStore: A Low Bandwidth, High De-duplication Efficiency Network Backup System

Jiayang Du,Hongliang Yu,Weimin Zheng
DOI: https://doi.org/10.1109/icsai.2012.6223150
2012-01-01
Abstract:De-duplication technology has been widely used in disk-based backup system in order to save disk space and reduce backup traffic through internet. But unfortunately De-duplication based backup system often has metadata indexing bottleneck that greatly reduces the backup efficiency and throughput. Existing approaches usually take advantage of backup data flow's similarity or locality to accelerate metadata indexing. In this paper, we design and implement MassStore, a de-duplication based network backup system which use a two-stage locality sensitive hash algorithm, that combines backup data flow's data similarity within data flow's chunk set and the locality between different chunk sets, to accelerate metadata indexing so as to improve de-duplication efficiency. The experimental results using real word data sets shows that our MassStore not only saved the backup storage by average of 88.5%, but also reduced the network bandwidth and RAM usage.
What problem does this paper attempt to address?