Extending Matching Rules with Conditions

Shaoxu Song
2010-01-01
Abstract:Matching dependencies (mds) have recently been proposed [10] in order to make dependencies tolerant to various information representations, and proved [13] useful in data quality applications such as record matching. Instead of strict identification function in traditional dependency syntax (e.g., functional dependencies), mds specify dependencies based on similarity matching quality. However, in practice, mds may still be too strict and only hold in a subset of tuples in a relation. Thereby, we study conditioning mds in a subset of tuples, called conditional matching dependencies (cmds), which bind matching dependencies only in a certain part of a table. Compared to mds, cmds have more expressive power that enable them satisfy wider application needs. In this paper, we study several important theoretical and practical issues of cmds, including inferring cmds, the irreducible cmds with less redundancy, the discovery of cmds from data, and so on. Through an extensive experimental evaluation in real data sets, we demonstrate the efficiency of proposed cmds discovery algorithms.
What problem does this paper attempt to address?