Discovering Editing Rules by Deep Reinforcement Learning.

Yinan Mei,Shaoxu Song,Chenguang Fang,Ziheng Wei,Jingyun Fang,Jiang Long
DOI: https://doi.org/10.1109/icde55515.2023.00034
2023-01-01
Abstract:Editing rules specify the conditions of applying high quality master data to repair low quality input data. Discovering editing rules, however, is challenging, since it considers not only the well curated master data but also the large-scale input data, an extremely large search space. A natural baseline, namely EnuMiner, costly enumerates the rules with possible conditions from both master and input data. Although several pruning strategies are enabled, the algorithm still takes a long time when the enumeration space is large. To avoid enumerating all candidate rules during mining, we argue to model the rule discovery process as a Markov Decision Process. Specifically, we discover editing rules by growing a rule tree where each node corresponds to a rule. The algorithm generates a new rule from the current node as a child node. We propose a reinforcement learning-based editing rule discovery algorithm, RLMiner, which trains an agent to wisely make decisions on branches when traversing the tree. Following the idea of evaluating rules, we design a reward function that is more in line with rule discovery scenarios and makes our algorithm perform effectively and efficiently. The experimental results show that our proposed RLMiner can mine high-utility editing rules like EnuMiner and scale well on the datasets with many attributes and large domains.
What problem does this paper attempt to address?