Datalog with First-Class Facts

Thomas Gilray,Arash Sahebolamri,Yihao Sun,Sowmith Kunapaneni,Sidharth Kumar,Kristopher Micinski
2024-11-22
Abstract:Datalog is a popular logic programming language for deductive reasoning tasks in a wide array of applications, including business analytics, program analysis, and ontological reasoning. However, Datalog's restriction to flat facts over atomic constants leads to challenges in working with tree-structured data, such as derivation trees or abstract syntax trees. To ameliorate Datalog's restrictions, popular extensions of Datalog support features such as existential quantification in rule heads (Datalog$^\pm$, Datalog$^\exists$) or algebraic data types (Soufflé). Unfortunately, these are imperfect solutions for reasoning over structured and recursive data types, with general existentials leading to complex implementations requiring unification, and ADTs unable to trigger rule evaluation and failing to support efficient indexing. We present DL$^{\exists!}$, a Datalog with first-class facts, wherein every fact is identified with a Skolem term unique to the fact. We show that this restriction offers an attractive price point for Datalog-based reasoning over tree-shaped data, demonstrating its application to databases, artificial intelligence, and programming languages. We implemented DL$^{\exists!}$ as a system \slog{}, which leverages the uniqueness restriction of DL$^{\exists!}$ to enable a communication-avoiding, massively-parallel implementation built on MPI. We show that Slog outperforms leading systems (Nemo, Vlog, RDFox, and Soufflé) on a variety of benchmarks, with the potential to scale to thousands of threads.
Databases,Programming Languages
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of the Datalog language when dealing with tree - like data structures. Datalog is a popular logic - programming language, widely used in areas such as business analysis, program analysis, and ontological reasoning. However, Datalog's restriction on flat facts over atomic constants leads to challenges when dealing with tree - structured data (such as derivation trees or abstract syntax trees). Although existing Datalog extensions (such as Datalog± and Datalog∃) support some features, such as existential quantification in rule heads or algebraic data types, these solutions are not perfect and cannot efficiently handle structured and recursive data types. To overcome these limitations, the paper proposes **DL∃!**, a Datalog extension with "first - class facts". In DL∃!, each fact is identified by a unique Skolem term, which enables DL∃! to handle tree - like data more effectively. The paper demonstrates the applications of DL∃! in areas such as databases, artificial intelligence, and programming languages, and implements the system Slog based on DL∃!. Slog takes advantage of the uniqueness of DL∃! to implement a communication - avoiding, massively parallel system built on MPI. Experimental results show that Slog outperforms other leading systems (such as Nemo, Vlog, RDFox, and Soufflé) in multiple benchmark tests and has the potential to scale to thousands of threads. ### Main Contributions 1. **Introduction of DL∃!**: Proposed a Datalog extension with "first - class facts", where each fact is uniquely identified by a nested Skolem term. The paper gives the semantics of DL∃! and introduces a language DLS that compiles to DL∃!. DLS supports directly nested facts and is equivalent to DL∃!. 2. **Application Demonstration**: Demonstrated the universality and relevance of DL∃! in multiple fields, including provenance, algebraic data types, functional programming, structural abstract interpretation, and type systems. 3. **Implementation of Slog**: Implemented a fully - functional data - parallel engine Slog that compiles DLS code to an MPI - based runtime. Slog utilizes the semantic limitations of DL∃! and combines the latest balanced parallel relation algebra (BPRA) technology to achieve efficient parallelization. 4. **Performance Evaluation**: Evaluated Slog in multiple applications such as graph analysis and program analysis, demonstrating its advantages in performance and scalability, especially when dealing with provenance information. ### Technical Details - **Syntax and Semantics**: DL∃! introduces unique existential quantification (∃!) in the rule head, ensuring that each fact has a unique Skolem term. This design eliminates the complexity of unification and simplifies the implementation of the query engine. - **Fixed - Point Semantics**: Defined the fixed - point semantics of DL∃!, generating new facts iteratively through the immediate consequence operator. - **Model - Theoretical Semantics**: The model - theoretical semantics of DL∃! is similar to that of Datalog, but takes into account the property of subfact - closure. - **Provenance and Algebraic Data Types**: Demonstrated the advantages of DL∃! in handling provenance and algebraic data types, especially in comparison with existing systems such as Soufflé. Overall, through the introduction of DL∃! and Slog, this paper solves the limitations of Datalog when dealing with tree - like data structures, providing an efficient and scalable solution.