Optimized Algorithm Of Discovering Functional Dependencies With Degrees Of Satisfaction Based On Attribute Pre-Scanning Operation
Qiang Wei,Guoqing Chen,Xiaocang Zhou
DOI: https://doi.org/10.1007/978-3-540-70812-4_24
2008-01-01
Abstract:Functional dependency (FD) is an important type of semantic knowledge reflecting integrity constraints in databases. Traditionally, FDs are proposed by managers or domain experts, which is regarded as a logic-driven method. FD has nowadays attracted an increasing amount of research attention in data mining and many efforts have been made to discover FDs in large-scale databases automatically. In mining FDs, two major problems exist. First, imprecise or noisy data may often exist in massive databases which will lead to missing precise FDs. Second, how to efficiently discover the so-called minimal set of FDs is still a hot issue. In order to tolerate partial truth due to imprecise or incomplete data, or due to a very tiny insignificance of tuple differences in a huge volume of data, the notion of functional dependency with degree of satisfaction, denoted as (FD)(d), has been proposed in [32], along with Armstrong-like properties and the concept of minimal set. Moreover, the efficient mining algorithm MFDD has been proposed in [29, 30, 33], by which some inference rules could be used to improve efficiency in mining process and the minimal set of satisfied (FD)(d) could be discovered. Based on the MFDD algorithm, this paper will further propose the concept of degree of diversity of attribute, which will be proved consistent to the framework of degree of satisfaction. Moreover, some important properties along with some optimization strategies will be presented. Furthermore, by measuring the degree of diversities of attributes with pre-scanning operation, quite many (FD)(d) could be determined satisfied or dissatisfied using the strategies. This process could highly save the computational consumption for further scanning databases in MFDD algorithm, which could effectively improve the efficiency of the whole mining algorithm. Furthermore, the experimental results show the optimization strategies could take significant effects to improve the computational efficiency. Finally, some concluding remarks and future works will be presented.