RPK-table Based Efficient Algorithm for Join-Aggregate Query on MapReduce.

Zhan Li,Qi Feng,Wei Chen,Tengjiao Wang
DOI: https://doi.org/10.1016/j.trit.2016.03.008
IF: 7.985
2016-01-01
CAAI Transactions on Intelligence Technology
Abstract:Join-aggregate is an important and widely used operation in database system. However, it is time-consuming to process join-aggregate query in big data environment, especially on MapReduce framework. The main bottlenecks contain two aspects: lots of I/O caused by temporary data and heavy communication overhead between different data nodes during query processing. To overcome such disadvantages, we design a data structure called Reference Primary Key table (RPK-table) which stores the relationship of primary key and foreign key between tables. Based on this structure, we propose an improved algorithm on MapReduce framework for join-aggregate query. Experiments on TPC-H dataset demonstrate that our algorithm outperforms existing methods in terms of communication cost and query response time.
What problem does this paper attempt to address?