Index Compression for BitFunnel Query Processing

Xinyu Liu,Zhaohua Zhang,Boran Hou,Rebecca J. Stones,Gang Wang,Xiaoguang Liu
DOI: https://doi.org/10.1145/3209978.3210086
2018-01-01
Abstract:Large-scale search engines utilize inverted indexes which store ordered lists of document identifies (docIDs) relevant to query terms, which can be queried thousands of times per second. In order to reduce storage requirements, we propose a dictionary-based compression approach for the recently proposed bitwise data-structure BitFunnel, which makes use of a Bloom filter. Compression is achieved through storing frequently occurring blocks in a dictionary. Infrequently occurring blocks (those which are not represented in the dictionary) are instead referenced using similar blocks that are in the dictionary, introducing additional false positive errors. We further introduce a docID reordering strategy to improve compression. Experimental results indicate an improvement in compression by 27% to 30%, at the expense of increasing the query processing time by 16% to 48% and increasing the false positive rate by around 7.6 to 10.7 percentage points.
What problem does this paper attempt to address?