A Learn-Then-Reason Model Towards Generalization in Knowledge Base Question Answering

Lingxi Zhang,Jing Zhang,Yanling Wang,Cuiping Li,Hong Chen
2024-06-21
Abstract:Large-scale knowledge bases (KBs) like Freebase and Wikidata house millions of structured knowledge. Knowledge Base Question Answering (KBQA) provides a user-friendly way to access these valuable KBs via asking natural language questions. In order to improve the generalization capabilities of KBQA models, extensive research has embraced a retrieve-then-reason framework to retrieve relevant evidence for logical expression generation. These multi-stage efforts prioritize acquiring external sources but overlook the incorporation of new knowledge into their model parameters. In effect, even advanced language models and retrievers have knowledge boundaries, thereby limiting the generalization capabilities of previous KBQA models. Therefore, this paper develops KBLLaMA, which follows a learn-then-reason framework to inject new KB knowledge into a large language model for flexible end-to-end KBQA. At the core of KBLLaMA, we study (1) how to organize new knowledge about KBQA and (2) how to facilitate the learning of the organized knowledge. Extensive experiments on various KBQA generalization tasks showcase the state-of-the-art performance of KBLLaMA. Especially on the general benchmark GrailQA and domain-specific benchmark Bio-chemical, KBLLaMA respectively derives a performance gain of up to 3.8% and 9.8% compared to the baselines.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the generalization problem in Knowledge Base Question Answering (KBQA), particularly the generalization ability within the same knowledge base (In-KB) and across different knowledge bases (Cross-KB). Traditional KBQA methods typically rely on a retrieval-reasoning framework, which limits their generalization ability when dealing with new knowledge. The authors propose a new model called KBLLaMA, which adopts a learning-reasoning framework to inject new knowledge base knowledge into large language models, thereby enhancing their generalization performance in various scenarios. Specifically, KBLLaMA achieves this goal through the following ways: 1. **Organizing New Knowledge**: Organizing new knowledge by generating high-quality <question, logical expression> training pairs and fine-tuning the model using this data. 2. **Knowledge Learning**: Improving training data by introducing the Chain-of-Thought strategy to better help the model learn new knowledge. 3. **Experimental Validation**: Demonstrating the superior performance of KBLLaMA in multiple benchmarks, achieving significant improvements of 3.8% and 9.8% in the GrailQA and biochemical domain benchmarks, respectively. In summary, the paper aims to overcome the generalization issues of existing KBQA models, particularly in cross-knowledge base scenarios, through a novel approach.