MASC: A Bitmap Index Encoding Algorithm for Fast Data Retrieval

Yuhao Wen,Han Wang,Zhen Chen,Junwei Cao,Guodong Peng,Wen-Liang Huang,Ziwei Hu,Jing Zhou,Jinghong Guo
DOI: https://doi.org/10.1109/icc.2016.7510827
2016-01-01
Abstract:The fast retrieval in archival traffic data is essential for network security and forensic analysis. A bitmap index is a data structure enabling fast search over large data collections in a limited time, but the space consumption is always a problem. WAH, PLWAH and COMPAX are proposed for compressing bitmap indexes for less storage. In this paper, a new bitmap index encoding scheme, named MASC, is proposed to further improve the compression ratio without impairing the query performance. Instead of being limited to a fixed length (31 bits) in PLWAH and COMPAX, the stride size can be as long as possible to encode consecutive zero bits and nonzero bits in a more compact way. Instead of piggyback used in PLWAH, a new structure in MASC called carrier is introduced as piggyback in PLWAH only carries an individual nonzero bit. We also generalize the traditional literal word concept in PLWAH and COMPAX. The validity of MASC encoding scheme is demonstrated with the application in Internet Traffic Archival system. Based on experiments with real Internet traffic data set from CAIDA, MASC has a better compression ratio than PLWAH and COMPAX2 without the penalty in query performance.
What problem does this paper attempt to address?