A Method Of Deduplication For Data Remote Backup

Jingyu Liu,Yu-an Tan,Yuanzhang Li,Xue-lan Zhang,Zexiang Zhou
DOI: https://doi.org/10.1007/978-3-642-18333-1_10
2011-01-01
Abstract:The paper describes the Remote Data Disaster Recovery System using Hash to identify and avoid sending duplicate data blocks between the Primary Node and the Secondary Node, thereby, to reduce the data replication network bandwidth, decrease overhead and improve network efficiency. On both nodes, some extra storage spaces (the Hash Repositories) besides data disks are used to record the Hash for each data block on data disks. We extend the data replication protocol between the Primary Node and the Secondary Node. When the data, whose Hash exists in the Hash Repository, is duplication, the block address is transferred instead of the data, and that reduces network bandwidth requirement, saves synchronization time, and improves network efficiency.
What problem does this paper attempt to address?