Inverted Index Compression Algorithms Based on 64-bit Architecture

Xu-dong ZHANG,Zhi-ming SUN,Ya-ning LIU,Dong-dong SHAN,Hong-fei YAN
DOI: https://doi.org/10.3969/j.issn.1000-3428.2014.02.016
2014-01-01
Abstract:In the 64-bit architecture of the CPU, the word length extends from 32 bit to 64 bit, and the data which CPU can process each time also increases to 64 bit. Few studies are performed to date to answer what influences 64-bit systems have on the compression and decompression of inverted index, which is the primary data structure in search engines. Some compression algorithms of posting lists work well on 32-bit machines, but are inefficient on 64-bit machines. This paper proposes three word-aligned compression algorithms on 64-bit system, namely, SimpleX64-16, SimpleX64-32 and SimpleX64-64. It adopts more modes and optimizes each mode for each algorithm. Experiments based on inverted index of GOV2 and ClueWeb09B show that those algorithms can improve compression ratio by 2.5% and decompression rate by 14.5%, compared with the traditional 32-bit word-aligned compression algorithms, on 64-bit machines.
What problem does this paper attempt to address?