Conjunctive Queries for Logic-Based Information Extraction

Sam M. Thompson
DOI: https://doi.org/10.48550/arXiv.2208.01298
2022-08-02
Abstract:This thesis offers two logic-based approaches to conjunctive queries in the context of information extraction. The first and main approach is the introduction of conjunctive query fragments of the logics FC and FC[REG], denoted as FC-CQ and FC[REG]-CQ respectively. FC is a first-order logic based on word equations, where the semantics are defined by limiting the universe to the factors of some finite input word. FC[REG] is FC extended with regular constraints. The second approach is to consider the dynamic complexity of FC.
Logic in Computer Science,Databases,Formal Languages and Automata Theory
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the application and optimization of conjunctive queries in the logical basis in the field of Information Extraction (IE). Specifically, the author proposes two logic - based methods to handle the application of conjunctive queries in information extraction: 1. **Introducing conjunctive query fragments**: The first method is to introduce fragments of conjunctive queries, namely FC - CQ and FC[REG] - CQ. These fragments are based on a logic named FC, which is a first - order logic based on word equations, and its semantic definition is to restrict the universe to factors of a certain finite input word. And FC[REG] is an extension of FC with regular constraints. 2. **Dynamic complexity analysis**: The second method is to examine the information extraction problem from the perspective of dynamic complexity. This involves encoding words using relational structures and studying the dynamic descriptive complexity classes when symbols can be modified. ### Main problems and contributions - **Comparison of expressive power**: The paper first compares the expressive power of FC[REG] - CQ and document spanners, and finds that some fragments match known language generators (such as patterns and regular expressions). - **Undecidability of decision problems**: The author proves that many decision problems (such as equivalence and regularity) regarding FC - CQ and FC[REG] - CQ are undecidable. In addition, the model - checking problem is NP - complete even for acyclic FC - CQ. - **Proposal of solvable fragments**: To improve solvability, the author explores methods of decomposing word equations into binary word equations. If a query only contains binary word equations and the query is acyclic, then the model - checking problem is solvable and the results can be enumerated efficiently. - **Dynamic complexity**: In terms of dynamic complexity, the author studies how to maintain query results by modifying a single position, and proves that Dynamic Conjunctive Queries (DynCQ) are more expressive than core spanners, and Dynamic First - Order Logic (DynFO) is more expressive than generalized core spanners. In summary, this paper aims to improve the processing of conjunctive queries in information extraction by introducing new logical tools and methods, and in particular, conducts in - depth discussions on aspects such as expressive power, solvability, and dynamic maintenance.