ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models

Haoran Luo,Haihong E,Zichen Tang,Shiyao Peng,Yikai Guo,Wentai Zhang,Chenghao Ma,Guanting Dong,Meina Song,Wei Lin,Yifan Zhu,Luu Anh Tuan

DOI: https://doi.org/10.18653/v1/2024.findings-acl.122

2024-10-30

Abstract:Knowledge Base Question Answering (KBQA) aims to answer natural language questions over large-scale knowledge bases (KBs), which can be summarized into two crucial steps: knowledge retrieval and semantic parsing. However, three core challenges remain: inefficient knowledge retrieval, mistakes of retrieval adversely impacting semantic parsing, and the complexity of previous KBQA methods. To tackle these challenges, we introduce ChatKBQA, a novel and simple generate-then-retrieve KBQA framework, which proposes first generating the logical form with fine-tuned LLMs, then retrieving and replacing entities and relations with an unsupervised retrieval method, to improve both generation and retrieval more directly. Experimental results show that ChatKBQA achieves new state-of-the-art performance on standard KBQA datasets, WebQSP, and CWQ. This work can also be regarded as a new paradigm for combining LLMs with knowledge graphs (KGs) for interpretable and knowledge-required question answering. Our code is publicly available.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper attempts to address three core challenges in Knowledge Base Question Answering (KBQA): 1. **Inefficient Knowledge Retrieval**: Traditional KBQA methods first identify the scope of candidate entities, then perform entity retrieval and relation retrieval. Due to the structural differences between natural language questions and facts in the knowledge base, most methods require training specialized models to extract and link entities, which is inefficient. 2. **Retrieval Errors Affect Semantic Parsing**: Previous methods use retrieved triples as input references for sequence-to-sequence (seq2seq) models. However, since the retrieved triples are not always accurate, this negatively impacts the results of semantic parsing. Additionally, if a large number of triples are retrieved, the seq2seq model needs to handle a longer context. 3. **Multi-step Complexity in KBQA Tasks**: Previous work decomposed the KBQA task into multiple subtasks, forming a complex pipeline that makes reproduction and transfer difficult. To overcome these challenges, the paper proposes **ChatKBQA**, a generate-then-retrieve KBQA framework based on open-source large language models (such as Llama, ChatGLM, and Baichuan). ChatKBQA simplifies the KBQA process into two efficient stages: generating logical forms and retrieving relevant entities and relations. Through this approach, ChatKBQA achieves new state-of-the-art performance on standard KBQA datasets WebQSP and CWQ.

ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models

Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models

In-Context Learning for Knowledge Base Question Answering for Unmanned Systems based on Large Language Models

LB-KBQA: Large-language-model and BERT based Knowledge-Based Question and Answering System

BT-CKBQA: an Efficient Approach for Chinese Knowledge Base Question Answering

FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

Gs-Cbr-Kbqa: Graph-Structured Case-Based Reasoning for Knowledge Base Question Answering

LFKQG: A Controlled Generation Framework with Local Fine-tuning for Question Generation over Knowledge Bases.

KBQA: Learning Question Answering over QA Corpora and Knowledge Bases

How Question Generation Can Help Question Answering over Knowledge Base

Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models

A Learn-Then-Reason Model Towards Generalization in Knowledge Base Question Answering

Knowledge-Enhanced Retrieval: A Scheme for Question Answering

ComQA: Question Answering over Knowledge Base Via Semantic Matching.

Enhancing Large Language Models with Knowledge Graphs for Robust Question Answering

BB-KBQA: BERT-Based Knowledge Base Question Answering

Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering

Bridging the KB-Text Gap: Leveraging Structured Knowledge-aware Pre-training for KBQA

A Framework of Knowledge Graph-Enhanced Large Language Model Based on Question Decomposition and Atomic Retrieval

Make a Choice! Knowledge Base Question Answering with In-Context Learning

Augmenting Reasoning Capabilities of LLMs with Graph Structures in Knowledge Base Question Answering