DPconv: Super-Polynomially Faster Join Ordering

Mihail Stoian,Andreas Kipf
2024-09-12
Abstract:We revisit the join ordering problem in query optimization. The standard exact algorithm, DPccp, has a worst-case running time of $O(3^n)$. This is prohibitively expensive for large queries, which are not that uncommon anymore. We develop a new algorithmic framework based on subset convolution. DPconv achieves a super-polynomial speedup over DPccp, breaking the $O(3^n)$ time-barrier for the first time. We show that the instantiation of our framework for the $C_\max$ cost function is up to 30x faster than DPccp for large clique queries.
Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to break the long - standing insurmountable \(O(3^n)\) time - complexity barrier, so as to optimize the join ordering problem in queries more efficiently?** Specifically, traditional exact algorithms such as DPccp have a running time of \(O(3^n)\) in the worst - case scenario, which is very costly when dealing with large - scale queries. The author proposes a new algorithm framework DPconv based on subset convolution. This framework can achieve super - polynomial acceleration beyond polynomial time and breaks the \(O(3^n)\) time barrier for the first time. DPconv achieves this breakthrough through the efficient fast subset convolution (FSC) technology and is up to 30 times faster than classical algorithms in large - scale clique queries. ### Main contributions of the paper: 1. **Introduced a new exact algorithm framework based on subset convolution**, breaking the long - existing \(O(3^n)\) time barrier for the first time. 2. **Provided a practical instantiation for the \(C_{\text{max}}\) cost function**, achieving a time complexity of \(O(2^n n^3)\). 3. **Proposed a \((1 +\epsilon)\)-approximate algorithm under \(C_{\text{out}}\)**, which runs in \(O(2^{3n/2}/\sqrt{\epsilon})\) time. 4. **Studied \(C_{\text{out}}\) and \(C_{\text{max}}\) jointly for the first time**, and provided a new cost function and implementation method that combines the advantages of both. ### Key concepts and techniques: - **Subset Convolution**: An important tool for optimizing recursive calculations in dynamic programming. Fast Subset Convolution (FSC) reduces the time complexity from \(O(3^n)\) to \(O(2^n n^2)\). - **Semi - Ring**: A mathematical structure used to define different cost functions. For example, \(C_{\text{out}}\) works in the \((\min, +)\) semi - ring, while \(C_{\text{max}}\) works in the \((\min, \max)\) semi - ring. - **Embedding Technique**: In order to handle semi - ring operations in fast subset convolution, embed the semi - ring into a ring and use polynomial representation to simplify the calculation. Through these innovations, DPconv not only improves the speed of query optimization but also provides new ideas and tools for future research.