Efficient Bidirectional Order Dependency Discovery

Yifeng Jin,Lin Zhu,Zijing Tan
DOI: https://doi.org/10.1109/ICDE48307.2020.00013
2020-01-01
Abstract:Bidirectional order dependencies state relationships of order between lists of attributes. They naturally model the order-by clauses in SQL queries, and are proved effective in query optimizations concerning sorting. Despite their importance, order dependencies on a dataset are typically unknown and are too costly, if not impossible, to design or discover manually. Techniques for automatic order dependency discovery are recently studied. It is challenging for order dependency discovery to scale well, since it is by nature factorial in the number m of attributes and quadratic in the number n of tuples. In this paper, we adopt a strategy that decouples the impact of m from that of n, and that still finds all minimal valid bidirectional order dependencies. We present carefully designed data structures, a host of algorithms and optimizations, for efficient order dependency discovery. With extensive experimental studies on both real-life and synthetic datasets, we verify our approach significantly outperforms state-of-the-art techniques, by orders of magnitude.
What problem does this paper attempt to address?