A Remote Data Backup System with Deduplication

Lu Youyou,Ao Li,Shu Jiwu
2012-01-01
Journal of Computer Research and Development
Abstract:Data deduption has greatly improved the efficiency of backup system, but with the imcreading cost of data matching overhead. In this paper, we design and implement a remote data backup system with deduplication named THBS, and propose HAD (hierarchy approach for data deduplication), which dedups data from directory, file, chunk, byte levels respectively. In addition, bloom filter and reverted index are used to reduce the number of file search and disk access. Two experiments on real scenarios show that THBS reduces storage usage with 63.1%-96.7%, consumes 71.3%-97.6%, 41.2%-66.7% less bandwidth and 75%-86%, 91%-97% the time compared with scp and rsync repectively.
What problem does this paper attempt to address?