Approximate Order Dependency Discovery

Yifeng Jin,Zijing Tan,Weijun Zeng,Shuai Ma
DOI: https://doi.org/10.1109/ICDE51399.2021.00010
2021-01-01
Abstract:Lexicographical order dependencies (ODs) specify orders between list of attributes, and are proven useful in optimizing SQL queries with order by clauses. To find hidden ODs from dirty data in practice, in this paper we make a first effort to study the approximate OD discovery problem, aiming at automatically discovering ODs that hold on the data with some exceptions. (1) We adapt two error measures to ODs, prove their desirable properties, and present efficient algorithms for computing the measures and related lower and upper bounds. (2) We present an efficient approximate OD discovery algorithm that is well suited to the two error measures, with a set of pruning rules and optimization techniques. (3) We conduct extensive experiments to verify the effectiveness and scalability of our methods, using real-life and synthetic data.
What problem does this paper attempt to address?