A Simple Algorithm for Worst-Case Optimal Join and Sampling

Florent Capelli,Oliver Irwin,Sylvain Salvati
2024-09-21
Abstract:We present an elementary branch and bound algorithm with a simple analysis of why it achieves worstcase optimality for join queries on classes of databases defined respectively by cardinality or acyclic degree constraints. We then show that if one is given a reasonable way for recursively estimating upper bounds on the number of answers of the join queries, our algorithm can be turned into algorithm for uniformly sampling answers with expected running time $O(UP/OUT)$ where $UP$ is the upper bound, $OUT$ is the actual number of answers and $O(\cdot)$ ignores polylogarithmic factors. Our approach recovers recent results on worstcase optimal join algorithm and sampling in a modular, clean and elementary way.
Databases
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is to design a simple and easy - to - analyze branch - and - bound algorithm in order to perform worst - case optimal join queries and uniform sampling on specific types of database instances (defined by cardinality constraints or acyclicity degree constraints). Specifically, the goals of the paper include: 1. **Worst - case optimal join queries**: For a given join query \(Q\), in the worst - case, the algorithm can output all answers within a time complexity of \(\tilde{O}(wc(C))\), where \(wc(C)\) is the number of answers in the worst - case instance of class \(C\). 2. **Uniform sampling**: If the upper bound of the number of join query answers can be recursively estimated, then this algorithm can also be used for uniform sampling, with an expected running time of \(\tilde{O}\left(\frac{UP}{OUT}\right)\), where \(UP\) is the upper bound and \(OUT\) is the actual number of answers. ### Main contributions of the paper 1. **Simple branch - and - bound algorithm**: - A very simple branch - and - bound algorithm is proposed. This algorithm avoids unnecessary exploration by assigning variable values bit by bit, thus ensuring worst - case optimality. - This algorithm does not require complex auxiliary data structures, but instead achieves worst - case optimality by assigning the value of each variable bit by bit. 2. **Proof of worst - case optimality**: - It is proved that this algorithm achieves worst - case optimality in classes defined by cardinality constraints or acyclicity degree constraints without knowing the specific worst - cases of these classes. - The concept of "prefix closedness" is introduced, and it is proved that classes satisfying this property can achieve worst - case optimality through this algorithm. 3. **Uniform sampling method**: - It is shown how to use this algorithm to achieve uniform sampling. Its complexity is the same as that of previous more complex techniques, but the method is simpler and more modular. - Using the weak form of Friedgut's inequality as a technical black box, it is proved that in classes defined by cardinality constraints or acyclicity degree constraints, uniform sampling can be achieved within the expected time. ### Formula summary - Worst - case optimality complexity: \(\tilde{O}(wc(C))\) - Expected running time for uniform sampling: \(\tilde{O}\left(\frac{UP}{OUT}\right)\) - Upper bound on the number of answers for triangular query \(Q_\Delta\): \((N_R N_S N_T)^{1/2}\) Through these contributions, the paper provides a concise and effective method to deal with the problems of worst - case optimality and uniform sampling in join queries, while avoiding the complexity in existing methods.