Faithful Embeddings for Knowledge Base Queries

Haitian Sun,Andrew O. Arnold,Tania Bedrax-Weiss,Fernando Pereira,William W. Cohen
DOI: https://doi.org/10.48550/arXiv.2004.03658
2021-01-29
Abstract:The deductive closure of an ideal knowledge base (KB) contains exactly the logical queries that the KB can answer. However, in practice KBs are both incomplete and over-specified, failing to answer some queries that have real-world answers. \emph{Query embedding} (QE) techniques have been recently proposed where KB entities and KB queries are represented jointly in an embedding space, supporting relaxation and generalization in KB inference. However, experiments in this paper show that QE systems may disagree with deductive reasoning on answers that do not require generalization or relaxation. We address this problem with a novel QE method that is more faithful to deductive reasoning, and show that this leads to better performance on complex queries to incomplete KBs. Finally we show that inserting this new QE module into a neural question-answering system leads to substantial improvements over the state-of-the-art.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two main problems in Knowledge Base (KB) queries: 1. **Problems of incompleteness and over - specification**: - In practical applications, knowledge bases are often incomplete, that is, they lack certain facts or relationships; at the same time, they are over - specified, that is, they contain some unnecessary details. This results in the knowledge base being unable to answer some queries that have definite answers in the real world. - Specifically, an ideal KB should be able to accurately answer all logical queries through deductive reasoning, but in reality, due to the above reasons, the KB cannot do this. 2. **The problem of logical fidelity in Query Embedding (QE) systems**: - Query embedding techniques represent KB entities and queries as vectors in the same embedding space, supporting relaxation and generalization in KB reasoning. However, existing QE systems are inconsistent with the results of deductive reasoning when dealing with queries that do not require generalization or relaxation, that is, these systems are not "faithful" enough to deductive reasoning. - The paper points out that the current state - of - the - art QE systems (such as Query2Box) perform poorly in finding logically - implied answers, which may be because these models focus too much on generalization ability and sacrifice the accurate representation of existing knowledge. ### Solutions To address these problems, the paper proposes the following improvement measures: - **Proposes a new QE method - EmQL (Embedding Query Language)**: - EmQL combines techniques such as neural retrieval and count - min sketch to improve the logical fidelity of QE systems while retaining their generalization ability. - In this way, EmQL can show better performance in complex queries, especially when dealing with incomplete knowledge bases. - **Experimental verification**: - The paper shows through experiments the significant advantages of EmQL in logically - implied queries and proves its superiority in multi - hop knowledge base question - answering tasks. ### Main contributions 1. **New QE scheme**: Introduces new methods for expressing set and relationship operations, including set intersection, union, and relationship filtering. 2. **Analysis of the deficiencies of existing QE methods**: Reveals the defects of existing QE methods in terms of logical fidelity. 3. **Applies QE as a module to knowledge base question - answering systems for the first time**: And achieves significant performance improvements on two widely - used benchmark datasets. In summary, this paper solves the trade - off problem between the logical fidelity and generalization ability of existing QE systems by proposing a query embedding method that is more faithful to deductive reasoning, thereby improving the accuracy of complex queries.