Characterizing Data Dependencies Then and Now

Phokion G. Kolaitis,Andreas Pieris
2024-08-02
Abstract:Data dependencies are integrity constraints that the data of interest must obey. During the 1980s, Janos Makowsky made a number of contributions to the study of data dependencies; in particular, he was the first researcher to characterize data dependencies in terms of their structural properties. The goal of this article is to first present an overview of Makowsky's work on characterizing certain classes of data dependencies and then discuss recent developments concerning characterizations of broader classes of data dependencies.
Logic in Computer Science,Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is about the characterization of data dependencies, especially the formal characterization of different types of data dependencies from the logical and mathematical perspectives. Specifically: 1. **Historical Background and Problem Definition**: - Since E.F. Codd introduced the relational data model in 1970, the interaction between logic and databases has become very close. Data dependencies refer to the integrity constraints that data must abide by. - Data dependencies can be divided into multiple types, such as Functional Dependencies (FDs), Inclusion Dependencies (INDs), Join Dependencies (JDs), and Multi - valued Dependencies (MVDs). These dependencies require a unified formal framework for research. 2. **János Makowsky's Contribution**: - In the 1980s, János Makowsky conducted in - depth research on data dependencies and was the first to characterize certain categories of data dependencies through their structural properties. - He solved the Implication Problem of data dependencies, that is, given a set of data dependencies and a new dependency, to determine whether the former logically implies the latter. Makowsky proved that the Implication Problem of Embedded Implicational Dependencies (EIDs) is undecidable. 3. **Modern Development and New Challenges**: - In recent years, researchers have attempted to further expand Makowsky's work to characterize a broader category of data dependencies. In particular, in multi - relational databases, how to use full TGDs (full Tuple - Generating Dependencies) and EGDs (Equality - Generating Dependencies) to axiomatize database sets. - New research has introduced a novel locality property to handle any type of TGDs and EGDs and has provided conditions for finite axiomatizability. 4. **Main Results**: - The main objective of this paper is to review Makowsky's work and discuss the latest progress in recent years regarding a broader category of data dependencies. In particular, the article details how to use closure properties and locality conditions to characterize database sets axiomatized by full TGDs and EGDs. In summary, this paper attempts to provide a systematic method to understand and characterize data dependencies by reviewing past research and combining new methods of modern development, thereby providing a solid theoretical foundation for database theory and applications.