Abstract:Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount of factual knowledge and often rely on external information for assistance. Recently, large language models (LLMs) (e.g., ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks with world knowledge, including knowledge-intensive tasks. However, it remains unclear how well LLMs are able to perceive their factual knowledge boundaries, particularly how they behave when incorporating retrieval augmentation. In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially, we focus on three primary research questions and analyze them by examining QA performance, priori judgement and posteriori judgement of LLMs. We show evidence that LLMs possess unwavering confidence in their capabilities to respond to questions and the accuracy of their responses. Furthermore, retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries, thereby improving their judgemental abilities. Additionally, we also find that LLMs have a propensity to rely on the provided retrieval results when formulating answers, while the quality of these results significantly impacts their reliance. The code to reproduce this work is available at <a class="link-external link-https" href="https://github.com/RUCAIBox/LLM-Knowledge-Boundary" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to explore the perception ability of large language models (LLMs) on the boundaries of factual knowledge in open - domain question - answering (QA) tasks and study the impact of retrieval augmentation on these models. Specifically, the paper mainly focuses on the following three research questions: 1. **To what extent can LLMs perceive their factual knowledge boundaries?** - Research how LLMs judge whether they have enough knowledge to answer questions accurately when answering questions. - Analyze the ability of LLMs to evaluate the correctness of their own answers. 2. **What is the impact of retrieval augmentation on LLMs?** - Explore whether introducing external information (such as retrieved documents) can help LLMs better understand questions and improve the accuracy of their answers. - Research whether retrieval augmentation can enhance LLMs' perception of their own knowledge boundaries. 3. **How do support documents with different characteristics affect LLMs?** - Compare the impact of different types of support documents (such as sparse retrieval, dense retrieval, and documents generated by ChatGPT itself) on the answer quality and confidence of LLMs. ### Main findings of the paper - **LLMs' perception of factual knowledge boundaries is inaccurate and over - confident**: LLMs often overestimate their abilities and tend to try to answer questions even when they do not have enough knowledge. - **Retrieval augmentation can effectively supplement LLMs' knowledge and improve their perception of knowledge boundaries**: By introducing support documents, the answer accuracy of LLMs and their evaluation of their own abilities have been significantly improved. - **High - quality support documents can significantly improve the performance and confidence of LLMs**: When facing high - quality support documents, LLMs are more likely to rely on these documents to generate answers, and their confidence is closely related to the relevance of the documents. - **The effect of dynamically introducing retrieval augmentation**: Dynamically deciding whether to introduce retrieval augmentation according to LLMs' prior judgments can improve the accuracy of answers to a certain extent. ### Experimental settings and methods To answer the above research questions, the author designed a variety of experimental settings, including: - **Normal Setting**: LLMs answer questions only relying on their own knowledge. - **Retrieval - Augmented Setting**: LLMs answer questions by combining external retrieved documents. - **Priori Judgement**: LLMs judge whether they can answer a certain question. - **Posteriori Judgement**: LLMs evaluate whether their answers to a certain question are correct. Through these settings, the author systematically analyzed the performance of LLMs under different conditions and drew the above conclusions. ### Summary This paper reveals the limitations of LLMs' perception of factual knowledge boundaries by in - depth analysis of LLMs' behavior in open - domain question - answering tasks and proves the effectiveness of retrieval augmentation. These findings provide an important reference for improving LLMs' knowledge utilization ability and self - evaluation mechanism in the future.

Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity

Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment

Systematic Assessment of Factual Knowledge in Large Language Models

Statistical Knowledge Assessment for Large Language Models

Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons

Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

Augmented Large Language Models with Parametric Knowledge Guiding

Enhancing Large Language Models with Knowledge Graphs for Robust Question Answering

Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts

Enhancing Large Language Models with Pseudo- and Multisource- Knowledge Graphs for Open-ended Question Answering

Self-Knowledge Guided Retrieval Augmentation for Large Language Models

Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study over Open-ended Question Answering

Do Large Language Models Know about Facts?

OntoFact: Unveiling Fantastic Fact-Skeleton of LLMs Via Ontology-Driven Reinforcement Learning

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs

Reasoning Factual Knowledge in Structured Data with Large Language Models

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators

KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases