DAFDiscover: Robust Mining Algorithm for Dynamic Approximate Functional Dependencies on Dirty Data
Xiaoou Ding,Yixing Lu,Hongzhi Wang,Chen Wang,Yida Liu,Jianmin Wang
DOI: https://doi.org/10.14778/3681954.3682015
IF: 2.5
2024-07-01
Proceedings of the VLDB Endowment
Abstract:Data dependency mining plays a crucial role in understanding data relationships. To address the increasing complexities of real-world data, Approximate Functional Dependencies (AFDs) have been introduced, building upon traditional FD. However, existing AFD approaches use static relaxation coefficients, limiting their effectiveness in capturing dependencies in noisy data. We propose a dynamic AFD variant, DAFD, which incorporates attribute error rates. We establish a bijection between DAFD and FD, develop its inference system, and introduce DAFDiscover, an algorithm for mining dependencies directly on noisy data. DAFDiscover matches the time and space complexity of SOTA AFD mining methods while offering superior performance. We theoretically prove its correctness, provide a method for calculating DAFD probabilities (DAFD- prob ), and derive a lower bound for DAFD's validity on dirty data. Experimental results on multiple public datasets demonstrate the semantic superiority of DAFD and the effectiveness of DAFDiscover compared to existing SOTA AFD mining techniques.
computer science, information systems, theory & methods