Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation

Ruiyang Ren,Yuhao Wang,Yingqi Qu,Wayne Xin Zhao,Jing Liu,Hao Tian,Hua Wu,Ji-Rong Wen,Haifeng Wang
2023-07-24
Abstract:Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount of factual knowledge and often rely on external information for assistance. Recently, large language models (LLMs) (e.g., ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks with world knowledge, including knowledge-intensive tasks. However, it remains unclear how well LLMs are able to perceive their factual knowledge boundaries, particularly how they behave when incorporating retrieval augmentation. In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially, we focus on three primary research questions and analyze them by examining QA performance, priori judgement and posteriori judgement of LLMs. We show evidence that LLMs possess unwavering confidence in their capabilities to respond to questions and the accuracy of their responses. Furthermore, retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries, thereby improving their judgemental abilities. Additionally, we also find that LLMs have a propensity to rely on the provided retrieval results when formulating answers, while the quality of these results significantly impacts their reliance. The code to reproduce this work is available at <a class="link-external link-https" href="https://github.com/RUCAIBox/LLM-Knowledge-Boundary" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Information Retrieval
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to explore the perception ability of large language models (LLMs) on the boundaries of factual knowledge in open - domain question - answering (QA) tasks and study the impact of retrieval augmentation on these models. Specifically, the paper mainly focuses on the following three research questions: 1. **To what extent can LLMs perceive their factual knowledge boundaries?** - Research how LLMs judge whether they have enough knowledge to answer questions accurately when answering questions. - Analyze the ability of LLMs to evaluate the correctness of their own answers. 2. **What is the impact of retrieval augmentation on LLMs?** - Explore whether introducing external information (such as retrieved documents) can help LLMs better understand questions and improve the accuracy of their answers. - Research whether retrieval augmentation can enhance LLMs' perception of their own knowledge boundaries. 3. **How do support documents with different characteristics affect LLMs?** - Compare the impact of different types of support documents (such as sparse retrieval, dense retrieval, and documents generated by ChatGPT itself) on the answer quality and confidence of LLMs. ### Main findings of the paper - **LLMs' perception of factual knowledge boundaries is inaccurate and over - confident**: LLMs often overestimate their abilities and tend to try to answer questions even when they do not have enough knowledge. - **Retrieval augmentation can effectively supplement LLMs' knowledge and improve their perception of knowledge boundaries**: By introducing support documents, the answer accuracy of LLMs and their evaluation of their own abilities have been significantly improved. - **High - quality support documents can significantly improve the performance and confidence of LLMs**: When facing high - quality support documents, LLMs are more likely to rely on these documents to generate answers, and their confidence is closely related to the relevance of the documents. - **The effect of dynamically introducing retrieval augmentation**: Dynamically deciding whether to introduce retrieval augmentation according to LLMs' prior judgments can improve the accuracy of answers to a certain extent. ### Experimental settings and methods To answer the above research questions, the author designed a variety of experimental settings, including: - **Normal Setting**: LLMs answer questions only relying on their own knowledge. - **Retrieval - Augmented Setting**: LLMs answer questions by combining external retrieved documents. - **Priori Judgement**: LLMs judge whether they can answer a certain question. - **Posteriori Judgement**: LLMs evaluate whether their answers to a certain question are correct. Through these settings, the author systematically analyzed the performance of LLMs under different conditions and drew the above conclusions. ### Summary This paper reveals the limitations of LLMs' perception of factual knowledge boundaries by in - depth analysis of LLMs' behavior in open - domain question - answering tasks and proves the effectiveness of retrieval augmentation. These findings provide an important reference for improving LLMs' knowledge utilization ability and self - evaluation mechanism in the future.