LBFM: Multi-Dimensional Membership Index for Block-Level Data Skipping

Yong Wang,Xiaochun Yun,Xi Wang,Shupeng Wang,Yongshang Wu
DOI: https://doi.org/10.1109/ISPA/IUCC.2017.00056
2017-01-01
Abstract:Data skipping has been a promising technique to reduce data access in query engines. By maintaining metadata for each block of tuples, a query may skip a block if the metadata indicates that the block does not contain relevant data. Obviously, the key factor is how to build effective metadata by extracting representative features of blocks. In this paper, we propose a multi-dimensional index, Layered Bloom Filter Matrix (LBFM), which adopts a recursively layered framework, and represents the matrix as an ordered hierarchy of hashmap and bitmap to compress space consumption instead of space-consuming bit matrix. Additionally, LBFM supports dimension combination cutting, and optimal indexing strategy could be generated according to it, thus the space efficiency could be further improved. We demonstrate time complexity of LBFM, and theoretically prove that LBFM has lower space consumption than Bloom Filter Matrix algorithm. We proto- typed our index technique on Spark SQL. Our experiments on TPC-H and a real-world workload show that LBFM gains significant improvement in aspect of query response time over traditional methods.
What problem does this paper attempt to address?