Index-Based OLAP Aggregation for In-Memory Cluster Computing

Yu Wang,Xiaojun Ye
DOI: https://doi.org/10.1109/CCBD.2014.13
2014-01-01
Abstract:In this paper, we present an OLAP aggregation approach based on key-value index to build the OLAP engine on the in-memory cluster computing environment. First, we generate binary code for each dimension attribute value and mask code for each dimension attribute. Then key-value indexes are constructed where the binary surrogate key is encoded by compounding the binary code of attribute values and the value is the list of matched fact table row ids. In our implementation, we choose Spark, which provides resilient distributed dataset(RDD) that enables users persist intermediate results in memory locally. We execute query filtering on key-value indexes by RDDs transformation and get satisfied fact row ids by intersect results of different RDDs. Through this way, we avoid complex join operations and multiple fact table scan, and further improve respond time by transforming RDDs in local machine memory. Different aggregations can be calculated on fact table only one pass. Finally, we experiment proposed OLAP prototype and two other open source ROLAP engines with TPC-DS benchmark data set. Experimental results demonstrate our OLAP prototype outperforms Shark and Hive.
What problem does this paper attempt to address?