Dynamic Functional Dependency Discovery with Dynamic Hitting Set Enumeration

Renjie Xiao,Yong'an Yuan,Zijing Tan,Shuai Ma,Wei Wang
DOI: https://doi.org/10.1109/ICDE53745.2022.00026
2022-01-01
Abstract:Functional dependencies (FDs) are widely applied in data management tasks. Since FDs on data are usually unknown, FD discovery techniques are studied for automatically finding hidden FDs from data. In this paper, we develop techniques to dynamically discover FDs in response to changes on data. Formally, given the complete set Sigma of minimal and valid FDs on a relational instance r, we aim to find the complete set Sigma' of minimal and valid FDs on r circle plus Delta r, where Delta r is a set of tuple insertions and deletions. Different from the batch approaches that compute Sigma' on r circle plus Delta r from scratch, our dynamic method computes Sigma' in response to Delta r by leveraging the known Sigma on r, and avoids processing the whole of r for each update from Delta r We tackle dynamic FD discovery on r circle plus Delta r by dynamic hitting set enumeration on the difference-set of r circle plus Delta r Specifically, (1) leveraging auxiliary structures built on r, we first present an efficient algorithm to update the difference-set of r to that of r circle plus Delta r. (2) We then compute Sigma', by recasting dynamic FD discovery as dynamic hitting set enumeration on the difference-set of r circle plus Delta r and developing novel techniques for dynamic hitting set enumeration. (3) We finally experimentally verify the effectiveness and efficiency of our approaches, using real-life and synthetic data. The results show that our dynamic FD discovery method outperforms the batch counterparts on most tested data, even when Delta r is up to 30% of r.
What problem does this paper attempt to address?