Abstract:Functional dependency (FD) is an important type of semantic knowledge reflecting integrity constraints in databases. Traditionally, FDs are proposed by managers or domain experts, which is regarded as a logic-driven method. FD has nowadays attracted an increasing amount of research attention in data mining and many efforts have been made to discover FDs in large-scale databases automatically. In mining FDs, two major problems exist. First, imprecise or noisy data may often exist in massive databases which will lead to missing precise FDs. Second, how to efficiently discover the so-called minimal set of FDs is still a hot issue. In order to tolerate partial truth due to imprecise or incomplete data, or due to a very tiny insignificance of tuple differences in a huge volume of data, the notion of functional dependency with degree of satisfaction, denoted as (FD)(d), has been proposed in [32], along with Armstrong-like properties and the concept of minimal set. Moreover, the efficient mining algorithm MFDD has been proposed in [29, 30, 33], by which some inference rules could be used to improve efficiency in mining process and the minimal set of satisfied (FD)(d) could be discovered. Based on the MFDD algorithm, this paper will further propose the concept of degree of diversity of attribute, which will be proved consistent to the framework of degree of satisfaction. Moreover, some important properties along with some optimization strategies will be presented. Furthermore, by measuring the degree of diversities of attributes with pre-scanning operation, quite many (FD)(d) could be determined satisfied or dissatisfied using the strategies. This process could highly save the computational consumption for further scanning databases in MFDD algorithm, which could effectively improve the efficiency of the whole mining algorithm. Furthermore, the experimental results show the optimization strategies could take significant effects to improve the computational efficiency. Finally, some concluding remarks and future works will be presented.

Discovering Approximate Functional Dependencies from Distributed Big Data.

Scalable Functional Dependencies Discovery from Big Data

Distributed Affinity Propagation Clustering Based on MapReduce

Efficient and Scalable Functional Dependency Discovery on Distributed Data-Parallel Platforms.

DAFDiscover: Robust Mining Algorithm for Dynamic Approximate Functional Dependencies on Dirty Data

Efficient Discovery of Functional Dependencies with Degrees of Satisfaction.

Approximate Functional Dependencies Based Query Evaluation Improvement

Dynamic Functional Dependency Discovery with Dynamic Hitting Set Enumeration

Fuzzy DR Algorithm for Data Distribution Management

Discovering Reliable Approximate Functional Dependencies

Optimized Algorithm Of Discovering Functional Dependencies With Degrees Of Satisfaction Based On Attribute Pre-Scanning Operation

A fuzzy grouping mechanism for distributed interactive simulation.

EulerFD: an Efficient Double-Cycle Approximation of Functional Dependencies

Relaxed Functional Dependency Discovery in Heterogeneous Data Lakes

Efficient Relaxed Functional Dependency Discovery with Minimal Set Cover

OPTIMIZED ALGORITHM OF DISCOVERING FUNCTIONAL DEPENDENCIES WITH DEGREES OF SATISFACTION

Using Conditional Functional Dependency to Discover Abnormal Data in RDF Graphs.

Repairing Functional Dependency Violations In Distributed Data

Mining Approximate Acyclic Schemes from Relations

Efficient Differential Dependency Discovery

Properties and Pre-Processing Strategies to Enhance the Discovery of Functional Dependency with Degree of Satisfaction