KBQA: Learning Question Answering over QA Corpora and Knowledge Bases

Wanyun Cui,Yanghua Xiao,Haixun Wang,Yangqiu Song,Seung-won Hwang,Wei Wang
DOI: https://doi.org/10.14778/3055540.3055549
2019-03-06
Abstract:Question answering (QA) has become a popular way for humans to access billion-scale knowledge bases. Unlike web search, QA over a knowledge base gives out accurate and concise results, provided that natural language questions can be understood and mapped precisely to structured queries over the knowledge base. The challenge, however, is that a human can ask one question in many different ways. Previous approaches have natural limits due to their representations: rule based approaches only understand a small set of "canned" questions, while keyword based or synonym based approaches cannot fully understand the questions. In this paper, we design a new kind of question representation: templates, over a billion scale knowledge base and a million scale QA corpora. For example, for questions about a city's population, we learn templates such as What's the population of $city?, How many people are there in $city?. We learned 27 million templates for 2782 intents. Based on these templates, our QA system KBQA effectively supports binary factoid questions, as well as complex questions which are composed of a series of binary factoid questions. Furthermore, we expand predicates in RDF knowledge base, which boosts the coverage of knowledge base by 57 times. Our QA system beats all other state-of-art works on both effectiveness and efficiency over QALD benchmarks.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the design of an efficient and accurate question - answering system on a large - scale knowledge base. Specifically, the paper focuses on how to understand and process natural language questions and precisely map them to structured queries in order to obtain accurate answers from the knowledge base. This involves two main challenges: 1. **Representation Design**: How to design a method that can understand and represent natural - language questions. These questions can be described as thousands of intents, and each intent may have thousands of different expressions. For example, "How many people are there in Honolulu?" and "What is the population of Honolulu?" have the same semantics although their expressions are different. Therefore, a representation method is required to identify questions with the same semantics and distinguish different question intents. 2. **Semantic Matching**: After determining the representation of the questions, how to map these representations to the structured queries in the knowledge base. For binary fact - type questions (BFQ), the structured queries mainly rely on the predicates in the knowledge base. However, due to the gap between natural - language questions and knowledge - base predicates, it is not easy to find this mapping relationship. For example, in Table 1, it is necessary to know that "How many people are there in Honolulu?" corresponds to the predicate "population". In addition, many binary relations are not represented by a single edge in the RDF graph, but by complex path structures. For example, the "spouse" relationship is represented by the path "marriage → person → name" in Figure 1. To address these challenges, the paper proposes a new template - based method. By learning a large number of templates to represent and understand natural - language questions and mapping these templates to the predicates in the knowledge base, this method can not only handle simple binary fact - type questions, but also complex fact - type questions. The latter can be solved by decomposing them into a series of binary fact - type questions. The paper achieves effective question - answering for large - scale knowledge bases by learning templates and their mapping relationships with knowledge - base predicates from Yahoo! Answers.