When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively

Tiziano Labruna,Jon Ander Campos,Gorka Azkune
2024-05-07
Abstract:In this paper, we demonstrate how Large Language Models (LLMs) can effectively learn to use an off-the-shelf information retrieval (IR) system specifically when additional context is required to answer a given question. Given the performance of IR systems, the optimal strategy for question answering does not always entail external information retrieval; rather, it often involves leveraging the parametric memory of the LLM itself. Prior research has identified this phenomenon in the PopQA dataset, wherein the most popular questions are effectively addressed using the LLM's parametric memory, while less popular ones require IR system usage. Following this, we propose a tailored training approach for LLMs, leveraging existing open-domain question answering datasets. Here, LLMs are trained to generate a special token, <RET>, when they do not know the answer to a question. Our evaluation of the Adaptive Retrieval LLM (Adapt-LLM) on the PopQA dataset showcases improvements over the same LLM under three configurations: (i) retrieving information for all the questions, (ii) using always the parametric memory of the LLM, and (iii) using a popularity threshold to decide when to use a retriever. Through our analysis, we demonstrate that Adapt-LLM is able to generate the <RET> token when it determines that it does not know how to answer a question, indicating the need for IR, while it achieves notably high accuracy levels when it chooses to rely only on its parametric memory.
Computation and Language,Information Retrieval
What problem does this paper attempt to address?
This paper explores how to make large language models (LLMs) more effective in utilizing information retrieval systems, especially when additional context is needed to answer questions. The research found that while information retrieval systems can help improve the quality of answers in some cases, not all questions require external information and sometimes relying solely on the parameter memory of LLMs is sufficient. The paper proposes a customized training method that allows LLMs to generate a special token "<RET>" when unable to answer a question, indicating the need for information retrieval. In the introduction, the authors point out two main LLM question-answering methods: closed-book QA and open-book QA. Closed-book QA only utilizes the parameter memory of LLMs, while open-book QA combines information retrieval systems. The PopQA dataset reveals differences between high popularity and low popularity questions, with the former typically answerable using LLM parameter memory and the latter requiring information retrieval. The paper introduces an adaptive retrieval LLM (ADAPT-LLM), which determines whether external information is needed by learning when to generate the "<RET>" token. Evaluation on the PopQA dataset shows that ADAPT-LLM performs excellently in deciding when to employ information retrieval and achieves high accuracy when answering questions relying solely on parameter memory. Experimental results demonstrate that ADAPT-LLM outperforms fixed strategies of always performing information retrieval or solely relying on parameter memory, and is comparable to strategies based on popularity thresholds even without the use of popularity scores. The paper also highlights the crucial role of IR system quality in model performance, as ADAPT-LLM significantly outperforms retrieved paragraphs on golden paragraphs. In conclusion, this paper addresses the question of how to train LLMs to intelligently judge when to utilize information retrieval to enhance QA task performance, and validates the effectiveness of this approach through experiments.