Query grouping-based multi-query optimization framework for interactive SQL query engines on Hadoop.

Ling Chen,Yan Lin,Jingchang Wang,Heqing Huang,Donghui Chen,Yong Wu
DOI: https://doi.org/10.1002/cpe.4676
2018-01-01
Abstract:In the past few years, executing high-concurrency queries with interactive SQL query engines on Hadoop has become an important activity for many organizations. However, these systems do not adopt Multi-Query Optimization (MQO) to accelerate the process. There are two major concerns. Firstly, traditional MQO researches assume that multiple queries have high similarity. However, these systems usually serve a variety of applications. Although queries from the same application have high similarity, queries from different applications may have low similarity, so using traditional MQO will be inefficient and time consuming. Secondly, integrating MQO may lead to lots of system modifications. To integrate MQO into interactive SQL query engines on Hadoop efficiently, a query grouping-based MQO framework is proposed. A lightweight mechanism is used to represent SQL queries, on which a grouping method is exploited to speed up the optimization process. A cost model is integrated to estimate the execution cost of interactive SQL query engines on Hadoop. By using the proposed framework, we modify Impala system to support MQO, and the experimental results on TPC-DS show significant performance improvements.
What problem does this paper attempt to address?