Single Hash: Use One Hash Function to Build Faster Hash Based Data Structures

Xiangyang Gou,Chenxingyu Zhao,Tong Yang,Lei Zou,Yang Zhou,Yibo Yan,Xiaoming Li,Bin Cui
DOI: https://doi.org/10.1109/bigcomp.2018.00048
2018-01-01
Abstract:With the scale of data to store or monitor in nowadays network constantly increasing, hash based data structures are more and more widely used because of their high memory efficiency and high speed. Most of them, like Bloom filters, sketches and d-Ieft hash tables use more than one hash function. Furthermore, in order to achieve good randomicity, the hash functions used, like MD5 and SHA1, are very complicated and consume a lot of CPU cycles to carry out. As a consequence, the implementation of these hash functions will be time-consuming, In order to address this issue, we propose Single Hash technique in this paper. It is based on the observation that the hash functions we use produce 32-bit or M-bit values which have much bigger value ranges than that we need in practice. We usually have to carry out modular operation to map the hash results into a smaller range in the data structures listed above. In this procedure, information carried by the high bits may be discarded. For example, if in a Bloom filter the length of the bit array is 220 while the hash functions we use are 32-bit hash functions, there are 12 bits in the results of the hash functions discarded in the procedure of modular. We can use these bits to produce more hash values. Therefore, we propose to use a few bit operations to make full use of the information produced by one hash function and generate multiple hash values which can be used in these data structures. Single Hash technique can be applied to most of the hash based data structures. It can significantly improve their speed, because instead of carrying out multiple hash functions, we only need to compute one hash function and a few simple operations (e.g., bit shift and XOR). Other aspects of performance, like memory efficiency and accuracy of these data structures will not be influenced by Single Hash technique. In this paper, we apply it to three kinds of classic hash based data structures, i.e., Bloom filters, CM sketches and d-Ieft hash tables as case studies, and evaluate their performance with both mathematical analysis and extensive experiments. We make all our codes open source on Github.
What problem does this paper attempt to address?