Efficient Set-Based Order Dependency Discovery with a Level-Wise Hybrid Strategy

Yihan Li,Ruifeng Li,Zijing Tan,Weidong Yang,Shuai Ma
DOI: https://doi.org/10.1109/icde60146.2024.00059
2024-01-01
Abstract:Order dependencies (ODs) state ordering specifications between attributes, and have been proven effective in query optimization for sorting operations. In this paper we investigate the problem of set-based OD discovery, for automatically finding hidden ODs from data. We tackle the problem with a novel level-wise hybrid strategy. With a given relational instance r, we discover ODs from a sample (subset) of r, validate the discovered ODs on r and refine the sample by leveraging the validation, in a level-by-level manner according to the lattice of set-based ODs. This process continues until the discovery result on the sample converges to that on r. We prove that a dynamic sample whose size keeps growing can be used in the process without affecting the correctness and completeness of the discovery result, and present techniques to incrementally refine the sample on demand. We also enhance our method with multi-threaded parallelism. On a host of datasets, our method is faster than the state-of-the-art method up to orders of magnitude even when the parallelism of our approach is disabled, and achieves up to a 4.5x self-relative parallel speedup with 6 threads.
What problem does this paper attempt to address?