Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment

Zhen Zhang,Xinyu Wang,Yong Jiang,Zhuo Chen,Feiteng Mu,Mengting Hu,Pengjun Xie,Fei Huang

2024-11-09

Abstract:Large Language Models (LLMs) are increasingly recognized for their practical applications. However, these models often encounter challenges in dynamically changing knowledge, as well as in managing unknown static knowledge. Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs. Actually, we find that the impact of RAG on the question answering capabilities of LLMs can be categorized into three groups: beneficial, neutral, and harmful. By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs, while also improving the overall performance of LLMs. This insight motivates us to differentiate between types of questions using certain metrics as indicators, to decrease the retrieval ratio without compromising performance. In our work, we propose a method that is able to identify different types of questions from this view by training a Knowledge Boundary Model (KBM). Experiments conducted on 11 English and Chinese datasets illustrate that the KBM effectively delineates the knowledge boundary, significantly decreasing the proportion of retrievals required for optimal end-to-end performance. Specifically, we evaluate the effectiveness of KBM in three complex scenarios: dynamic knowledge, long-tail static knowledge, and multi-hop problems, as well as its functionality as an external LLM plug-in.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced by large - language models (LLMs) when dealing with dynamically changing knowledge and unknown static knowledge. Although the Retrieval - Augmented Generation (RAG) technique can effectively address these challenges, RAG also brings additional time and computational costs. The authors of the paper observe that the impact of RAG on the question - answering ability of LLMs can be divided into three categories: beneficial, neutral, and harmful. By reducing retrieval requests that lead to neutral or harmful results, time and computational costs can be effectively reduced while improving the overall performance of LLMs. Based on this insight, the authors propose a method to identify different types of question - answering requests by training a Knowledge Boundary Model (KBM), thereby reducing the retrieval ratio without sacrificing performance. Experimental results show that KBM can effectively demarcate knowledge boundaries and significantly reduce the retrieval proportion required to achieve optimal end - to - end performance. In particular, in complex scenarios such as dynamic knowledge, long - tail static knowledge, and multi - hop questions, KBM has shown good performance. In addition, KBM can also be used as an external plug - in to enhance the capabilities of LLMs.

Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment

Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation

Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question Answering

Enhancing Large Language Models with Knowledge Graphs for Robust Question Answering

Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation

Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models

On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models

Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

Towards Building a Robust Knowledge Intensive Question Answering Model with Large Language Models

Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals

WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs

Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval

Knowledge Graph-Enhanced Large Language Models via Path Selection

Generative Multi-Modal Knowledge Retrieval with Large Language Models

Retrieval and Reasoning on KGs: Integrate Knowledge Graphs into Large Language Models for Complex Question Answering

Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts

LB-KBQA: Large-language-model and BERT based Knowledge-Based Question and Answering System

Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation

Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering

KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases