A General Analytical Model for Spatial and Temporal Performance of Bitmap Index Compression Algorithms in Big Data
Yinjun Wu,Zhen Chen,Yuhao Wen,Junwei Cao,Wenxun Zheng,Ge Ma
DOI: https://doi.org/10.1109/icccn.2015.7288362
2015-01-01
Abstract:Bitmap indexing is flexible to conduct boolean operations in data retrieval. Besides, the query processing based on bitmap indexing is also very fast. Therefore it has been widely used in various big data analytics platforms, such as Druid and Spark etc. However, bitmap index can consume a large amount of memory, which leads to the invention of different kinds of bitmap index compression algorithms without sacrificing temporal performance. In practice, we are often discommoded by choosing a proper algorithm when handling specific problems. Besides, after devising a new algorithm that may outperform existing ones, it is essential to evaluate its performance in theory. Without appropriate theoretical analysis, the deficit of a new algorithm can only be spotted until final experimental results are drawn, thus wasting much time and effort. In this paper, we propose a general analytical model to analyze both the spatial and temporal performance for bitmap index compression algorithms, which can be applied to analyze all kinds of algorithms derived from WAH (word-aligned hybrid). In this model, two types of distributed bitmaps, uniformly distributed bitmaps and clustered bitmaps, are used separately. In order to illustrate this model, several bitmap index compression algorithms are analyzed and compared with each other. Algorithms herein are COMBAT (COMbining Binary And Ternary encoding), SECOMPAX (Scope Extended COMPAX) and CONCISE (Compressed 'n' Composable Integer Set), which are all derived from WAH. Evaluation results by MATLAB simulation about these algorithms are also presented. This paper paves the way for further researches on the performance evaluation of various bitmap index compression algorithms in the future.