TSCF: an Efficient Two-Stage Cuckoo Filter for Data Deduplication

Tao Liu,Qinshu Chen,Hui Li,Bohui Wang,Xin Yang
DOI: https://doi.org/10.1109/msn53354.2021.00118
2021-01-01
Abstract:The rapid growth of data on the Internet has brought huge challenges to storage systems. Data deduplication technology is proposed to solve the problem of data redundancy. As one of the data deduplication technologies, the memory-assisted method uses an approximate membership data structure to greatly reduce the space consumption of membership determination. The approximate membership data structures represented by the cuckoo filter have been widely used. However, there is a lack of efficient ways to solve the problem that the insertion time increases exponentially with the load rate of the cuckoo filter. In this paper, an efficient cuckoo filter named TSCF is proposed with a two-stage insertion algorithm. The TSCF balances the load of the filter through active relocations in the first stage, laying the foundation for the second stage. Through the experiments, the cumulative relocation times of the TSCF are reduced to 37% and 46% respectively compared with the SCF and the CFBF, indicating that the TSCF greatly reduces the relocation times and insertion time of the entire insertion process, and improves the performance of the cuckoo filter.
What problem does this paper attempt to address?