TBF: a high-efficient query mechanism in de-duplication backup system

Bin Zhou,Hai Jin,Xia Xie,PingPeng Yuan
DOI: https://doi.org/10.1007/978-3-642-30767-6_21
2012-01-01
Abstract:For the big data, the fingerprints of the data chunks are very huge and cannot be stored in the memory completely. Accordingly, a new query mechanism namely Two-stage Bloom Filter mechanism is proposed. First, each bit of the second grade bloom filter represents the chunks having the identical fingerprints which reducing the rate of false positives. Second, a two-dimensional list is created corresponding to the two grade bloom filter to gather the absolute addresses of the data chunks with the identical fingerprints. Finally, we suggest a new hash function class with the strong global random characteristic. Two-stage Bloom Filter decreases the number of accessing disks, improves the speed of detecting the redundant data chunks, and reduces the rate of false positive. Our experiments indicate that Two-stage Bloom Filter reduces about 30~40% storage accessing of false positive with the same length of the first grade Bloom Filter.
What problem does this paper attempt to address?