Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

Orion Weller,Benjamin Van Durme,Dawn Lawrie,Ashwin Paranjape,Yuhao Zhang,Jack Hessel
2024-09-17
Abstract:Instruction-tuned language models (LM) are able to respond to imperative commands, providing a more natural user interface compared to their base counterparts. In this work, we present Promptriever, the first retrieval model able to be prompted like an LM. To train Promptriever, we curate and release a new instance-level instruction training set from MS MARCO, spanning nearly 500k instances. Promptriever not only achieves strong performance on standard retrieval tasks, but also follows instructions. We observe: (1) large gains (reaching SoTA) on following detailed relevance instructions (+14.3 p-MRR / +3.1 nDCG on FollowIR), (2) significantly increased robustness to lexical choices/phrasing in the query+instruction (+12.9 Robustness@10 on InstructIR), and (3) the ability to perform hyperparameter search via prompting to reliably improve retrieval performance (+1.4 average increase on BEIR). Promptriever demonstrates that retrieval models can be controlled with prompts on a per-query basis, setting the stage for future work aligning LM prompting techniques with information retrieval.
Information Retrieval,Computation and Language,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of flexibility and accuracy of information retrieval (IR) models when dealing with natural language instructions. Traditional information retrieval models usually match queries and documents based on a single semantic similarity score, which makes the user experience rather rigid. Users need to constantly adjust keywords or use advanced search settings to find the required documents. Specifically, the paper proposes **Promptriever**, a retrieval model that can be controlled by natural language prompts like a language model. Compared with traditional retrieval models, Promptriever can dynamically adjust its understanding of relevance according to specific natural language instructions, thus providing more flexible and accurate retrieval results. The following are the main problems that the paper attempts to solve: 1. **Enhancing the instruction - following ability of retrieval models**: - After standard IR training, traditional retrieval models lose their ability to respond to natural language instructions. By introducing an instruction dataset, the paper enables Promptriever to retain the instruction - following ability of its underlying language model during the training process. 2. **Improving the retrieval model's understanding of complex instructions**: - Promptriever can handle complex instructions, including detailed relevance definitions, and can optimize retrieval performance through prompts in a zero - sample situation. For example, a user can describe specific retrieval conditions in natural language, such as "Only retrieve James Cameron movies that were not co - directed before 2022". 3. **Increasing the robustness of the retrieval model**: - Promptriever shows stronger robustness to changes in query length and wording, reducing performance fluctuations caused by different query forms. For example, experiments on the BEIR dataset show that the variance of Promptriever is reduced by 44% and it is improved by 12.9% on the Robustness@10 metric. 4. **Achieving zero - sample prompt optimization**: - Promptriever can reliably improve retrieval performance through simple natural language prompts (such as "Carefully consider relevance and I will tip you"), which makes prompt engineering and automatic prompt methods possible. ### Summary By introducing Promptriever, the paper shows how modern dual - encoder retrieval models can be made to have natural language instruction - following ability through appropriate training data, thereby significantly improving retrieval performance and user experience. This method not only performs well in standard retrieval tasks but also reaches the state - of - the - art level in instruction - following tasks.