Abstract:Querying in isolation lacks the potential of reusing intermediate results, which ends up wasting computational resources. Multi-Query Optimization (MQO) addresses this challenge by devising a shared execution strategy across queries, with two generally used strategies: batched or cached . These strategies are shown to improve performance, but hardly any study explores the combination of both. In this work we explore such a hybrid MQO, combining batching (Shared Sub-Expression) and caching (Materialized View Reuse) techniques. Our hybrid-MQO system merges batched query results as well as caches the intermediate results, thereby any new query is given a path within the previous plan as well as reusing the results. Since caching is a key component for improving performance, we measure the impact of common caching techniques such as FIFO, LRU, MRU and LFU. Our results show LRU to be the optimal for our usecase, which we use in our subsequent evaluations. To study the influence of batching, we vary the factor - derivability - which represents the similarity of the results within a query batch. Similarly, we vary the cache sizes to study the influence of caching. Moreover, we also study the role of different database operators in the performance of our hybrid system. The results suggest that, depending on the individual operators, our hybrid method gains a speed-up between 4x to a slowdown of 2x from using MQO techniques in isolation. Furthermore, our results show that workloads with a generously sized cache that contain similar queries benefit from using our hybrid method, with an observed speed-up of 2x over sequential execution in the best case.

Query grouping-based multi-query optimization framework for interactive SQL query engines on Hadoop.

Optimization Factor Analysis Of Large-Scale Join Queries On Different Platforms

Query optimization for massively parallel data processing.

AQP++: Connecting Approximate Query Processing with Aggregate Precomputation for Interactive Analytics

Accelerating Apache Hive with MPI for Data Warehouse Systems

AQUA+: Query Optimization for Hybrid Database-MapReduce System.

Exploiting Shared Sub-Expression and Materialized View Reuse for Multi-Query Optimization

A Query Execution Scheduling Scheme for Impala System.

SAQP++: Bridging the Gap Between Sampling-Based Approximate Query Processing and Aggregate Precomputation.

Cost-Based Optimization Of Logical Partitions For A Query Workload In A Hadoop Data Warehouse

Logical Query Optimization for Cloudera Impala System

Efficient Multi-way Theta-Join Processing Using MapReduce

Service-oriented Execution Model Supporting Data Sharing and Adaptive Query Processing

Optimization for Iterative Queries on MapReduce

Multiple Query Optimization in PBASE/3

Optimization of sub-query processing in distributed data integration systems

New Distributed Spatial Query Optimization Approach by Using Query Analyzer

MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases

HyMJ: A Hybrid Structure-Aware Approach to Distributed Multi-way Join Query

Optimizing Communications in Processing Data Integration Queries

Optimizing Internal Overlaps by Self-Adjusting Resource Allocation in Multi-Stage Computing Systems