Discovery of Approximate Lexicographical Order Dependencies

Yifeng Jin,Zijing Tan,Jixuan Chen,Shuai Ma
DOI: https://doi.org/10.1109/tkde.2021.3130227
IF: 9.235
2021-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Lexicographical order dependencies (LODs) specify orders between list of attributes, and are proven useful in optimizing SQL queries with order by clauses. To discover hidden dependencies from dirty data in practice, approximate dependency discoveries are actively studied, aiming at automatically discovering dependencies that hold on data with some exceptions. In this paper we study the discovery of approximate LODs. (1) We adapt two error measures, namely g(1) and g(3), to LODs. We prove their desirable properties, present efficient algorithms for computing the measures and related lower and upper bounds, and study the relationship between the two measures. (2) We present an efficient approximate LOD discovery algorithm that is well suited to the two error measures, with a set of pruning rules, optimization techniques and ranking functions. (3) We study techniques for estimating g1 by sampling, with high accuracy and far less time. (4) We conduct extensive experiments to verify the effectiveness and scalability of our methods, using both real-life and synthetic data.
What problem does this paper attempt to address?