Abstract:In this paper, we demonstrate how Large Language Models (LLMs) can effectively learn to use an off-the-shelf information retrieval (IR) system specifically when additional context is required to answer a given question. Given the performance of IR systems, the optimal strategy for question answering does not always entail external information retrieval; rather, it often involves leveraging the parametric memory of the LLM itself. Prior research has identified this phenomenon in the PopQA dataset, wherein the most popular questions are effectively addressed using the LLM's parametric memory, while less popular ones require IR system usage. Following this, we propose a tailored training approach for LLMs, leveraging existing open-domain question answering datasets. Here, LLMs are trained to generate a special token, <RET>, when they do not know the answer to a question. Our evaluation of the Adaptive Retrieval LLM (Adapt-LLM) on the PopQA dataset showcases improvements over the same LLM under three configurations: (i) retrieving information for all the questions, (ii) using always the parametric memory of the LLM, and (iii) using a popularity threshold to decide when to use a retriever. Through our analysis, we demonstrate that Adapt-LLM is able to generate the <RET> token when it determines that it does not know how to answer a question, indicating the need for IR, while it achieves notably high accuracy levels when it chooses to rely only on its parametric memory.

What problem does this paper attempt to address?

This paper explores how to make large language models (LLMs) more effective in utilizing information retrieval systems, especially when additional context is needed to answer questions. The research found that while information retrieval systems can help improve the quality of answers in some cases, not all questions require external information and sometimes relying solely on the parameter memory of LLMs is sufficient. The paper proposes a customized training method that allows LLMs to generate a special token "<RET>" when unable to answer a question, indicating the need for information retrieval. In the introduction, the authors point out two main LLM question-answering methods: closed-book QA and open-book QA. Closed-book QA only utilizes the parameter memory of LLMs, while open-book QA combines information retrieval systems. The PopQA dataset reveals differences between high popularity and low popularity questions, with the former typically answerable using LLM parameter memory and the latter requiring information retrieval. The paper introduces an adaptive retrieval LLM (ADAPT-LLM), which determines whether external information is needed by learning when to generate the "<RET>" token. Evaluation on the PopQA dataset shows that ADAPT-LLM performs excellently in deciding when to employ information retrieval and achieves high accuracy when answering questions relying solely on parameter memory. Experimental results demonstrate that ADAPT-LLM outperforms fixed strategies of always performing information retrieval or solely relying on parameter memory, and is comparable to strategies based on popularity thresholds even without the use of popularity scores. The paper also highlights the crucial role of IR system quality in model performance, as ADAPT-LLM significantly outperforms retrieved paragraphs on golden paragraphs. In conclusion, this paper addresses the question of how to train LLMs to intelligently judge when to utilize information retrieval to enhance QA task performance, and validates the effectiveness of this approach through experiments.

When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively

Adapting LLMs for Efficient, Personalized Information Retrieval: Methods and Implications

RetLLM-E: Retrieval-Prompt Strategy for Question-Answering on Student Discussion Forums

RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit

When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation

Let your LLM generate a few tokens and you will reduce the need for retrieval

RRAML: Reinforced Retrieval Augmented Machine Learning

Reliable, Adaptable, and Attributable Language Models with Retrieval

RE-AdaptIR: Improving Information Retrieval through Reverse Engineered Adaptation

JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability

Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs

Bridging the Preference Gap between Retrievers and LLMs

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Retrieval-Augmented Retrieval: Large Language Models Are Strong Zero-Shot Retriever.

Retrieval meets Long Context Large Language Models

Layered Query Retrieval: an Adaptive Framework for Retrieval-Augmented Generation in Complex Question Answering for Large Language Models

Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs

In-Context Retrieval-Augmented Language Models

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

Know where to go: Make LLM a relevant, responsible, and trustworthy searchers

Large Language Models for Information Retrieval: A Survey