Optimizing Disjunctive Queries with Tagged Execution

Albert Kim,Samuel Madden
2024-04-23
Abstract:Despite decades of research into query optimization, optimizing queries with disjunctive predicate expressions remains a challenge. Solutions employed by existing systems (if any) are often simplistic and lead to much redundant work being performed by the execution engine. To address these problems, we propose a novel form of query execution called tagged execution. Tagged execution groups tuples into subrelations based on which predicates in the query they satisfy (or don't satisfy) and tags them with that information. These tags then provide additional context for query operators to take advantage of during runtime, allowing them to eliminate much of the redundant work performed by traditional engines and realize predicate pushdown optimizations for disjunctive predicates. However, tagged execution brings its own challenges, and the question of what tags to create is a nontrivial one. Careless creation of tags can lead to an exponential blowup in the tag space, with the overhead outweighing the benefits. To address this issue, we present a technique called tag generalization to minimize the space of tags. We implemented the tagged execution model with tag generalization in our system Basilisk, and our evaluation shows an average 2.7x speedup in runtime over the traditional execution model with up to a 19x speedup in certain situations.
Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to optimize queries containing disjunctive predicate expressions (disjunctive queries). Although there has been decades of research in the field of query optimization, existing systems still face challenges in optimizing queries containing disjunctive predicates. The solutions of existing systems are often too simplistic, causing the execution engine to perform a large amount of redundant work. Specifically, the paper points out the following problems when current systems handle queries containing disjunctive predicates: 1. **Limitations of existing methods**: - **Filtering after direct joining**: First perform the join operation, and then evaluate the predicate expressions on the join results. This is equivalent to no optimization at all. As the number of joins increases, the size of the join results will increase exponentially, resulting in a significant increase in running time. - **Decomposing into multiple queries**: Treat each disjunctive part as an independent query, apply predicate pushdown separately, and finally combine the results through a union operation. Although this method can achieve predicate pushdown, it will repeatedly construct tuples that satisfy multiple conditions, and an additional union operation is required to filter out duplicates. Moreover, this method is only applicable to predicate expressions in disjunctive normal form (DNF), and not applicable to conjunctive normal form (CNF). 2. **Redundant work and performance issues**: Existing methods cannot effectively avoid redundant calculations, especially when dealing with complex queries, which will lead to performance degradation. To address these problems, the author proposes a new query execution model - **tagged execution**. Tagged execution helps query operators use this information to avoid redundant work at runtime and achieve predicate pushdown optimization for disjunctive predicates by attaching tags to tuples to record whether they satisfy certain predicate conditions in the query. However, tagged execution also brings new challenges, such as how to manage tags to avoid exponential expansion of the tag space. For this reason, the author proposes the **tag generalization** technique to minimize the tag space and ensure the effectiveness and efficiency of tagged execution. In summary, this paper aims to solve the optimization problems faced by existing systems when handling queries containing disjunctive predicates by introducing the tagged execution model and its related techniques, thereby significantly improving query performance.