Abstract:Efficient knowledge management plays a pivotal role in augmenting both the operational efficiency and the innovative capacity of businesses and organizations. By indexing knowledge through vectorization, a variety of knowledge retrieval methods have emerged, significantly enhancing the efficacy of knowledge management systems. Recently, the rapid advancements in generative natural language processing technologies paved the way for generating precise and coherent answers after retrieving relevant documents tailored to user queries. However, for enterprise knowledge bases, assembling extensive training data from scratch for knowledge retrieval and generation is a formidable challenge due to the privacy and security policies of private data, frequently entailing substantial costs. To address the challenge above, in this paper, we propose EKRG, a novel Retrieval-Generation framework based on large language models (LLMs), expertly designed to enable question-answering for Enterprise Knowledge bases with limited annotation costs. Specifically, for the retrieval process, we first introduce an instruction-tuning method using an LLM to generate sufficient document-question pairs for training a knowledge retriever. This method, through carefully designed instructions, efficiently generates diverse questions for enterprise knowledge bases, encompassing both fact-oriented and solution-oriented knowledge. Additionally, we develop a relevance-aware teacher-student learning strategy to further enhance the efficiency of the training process. For the generation process, we propose a novel chain of thought (CoT) based fine-tuning method to empower the LLM-based generator to adeptly respond to user questions using retrieved documents. Finally, extensive experiments on real-world datasets have demonstrated the effectiveness of our proposed framework.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve efficient question answering in enterprise knowledge bases, especially how to use large - language models (LLMs) to enhance the knowledge retrieval and generation capabilities of enterprises under limited annotation costs. Specifically, one of the main challenges that enterprises face when constructing knowledge bases is how to assemble a large amount of training data from scratch for knowledge retrieval and generation. This is not only because privacy and security policies limit the use of private data, but also because this process usually requires high costs. For this reason, the paper proposes a new retrieval - generation framework, EKRG, which aims to effectively provide question - answering services for enterprise knowledge bases by reducing annotation costs through the use of LLMs. The key innovation points of the paper include: 1. **Instruction - tuning method**: Use LLMs to generate sufficient document - question pairs to train the knowledge retriever. This method can efficiently generate diverse questions in enterprise knowledge bases covering fact - oriented and solution - oriented knowledge. 2. **Relevance - aware teacher - student learning strategy**: Further improve the efficiency of the training process and improve the quality of the generated document - question pairs by iteratively updating the knowledge retriever. 3. **Chain - of - Thought (CoT) - based fine - tuning method**: Enable the LLM generator to flexibly respond to users' questions according to the retrieved documents, especially adding logical reasoning steps in the process of generating answers to improve the relevance and accuracy of the answers. These innovations together solve the problems of efficient and low - cost knowledge management and question answering in enterprise knowledge bases, demonstrating their potential and effectiveness in practical application scenarios.

Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models

Enhancing Large Language Models with Knowledge Graphs for Robust Question Answering

ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models

Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering

Research on Intelligent Question-Answering Systems Based on Large Language Models and Knowledge Graphs

LB-KBQA: Large-language-model and BERT based Knowledge-Based Question and Answering System

Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering

Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models

Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment

KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases

Enhancing Large Language Models with Pseudo- and Multisource- Knowledge Graphs for Open-ended Question Answering

Aggregated Knowledge Model: Enhancing Domain-Specific QA with Fine-Tuned and Retrieval-Augmented Generation Models

Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs

Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts

Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval

Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

In-Context Learning for Knowledge Base Question Answering for Unmanned Systems based on Large Language Models