RARe: Retrieval Augmented Retrieval with In-Context Examples

Atula Tejaswi,Yoonsang Lee,Sujay Sanghavi,Eunsol Choi
2024-10-26
Abstract:We investigate whether in-context examples, widely used in decoder-only language models (LLMs), can improve embedding model performance in retrieval tasks. Unlike in LLMs, naively prepending in-context examples (query-document pairs) to the target query at inference time does not work out of the box. We introduce a simple approach to enable retrievers to use in-context examples. Our approach, RARe, finetunes a pre-trained model with in-context examples whose query is semantically similar to the target query. This can be applied to adapt various base architectures (i.e., decoder-only language models, retriever models) and consistently achieves performance gains of up to +2.72% nDCG across various open-domain retrieval datasets (BeIR, RAR-b). In particular, we find RARe exhibits stronger out-of-domain generalization compared to models using queries without in-context examples, similar to what is seen for in-context learning in LLMs. We further provide analysis on the design choices of in-context example augmentation and lay the foundation for future work in this space.
Computation and Language,Artificial Intelligence,Information Retrieval
What problem does this paper attempt to address?
This paper attempts to solve the problem of how to use in - context examples to enhance the performance of retrieval models. Specifically, the authors study how to effectively use in - context examples that are semantically similar to the target query in retrieval tasks to improve the performance of embedding models. Unlike decoder - only language models (LLMs), simply appending in - context examples to the target query during inference does not directly improve performance. Therefore, the authors propose a new method - RARe (Retrieval Augmented Retrieval with In - Context Examples), which utilizes these in - context examples by fine - tuning pre - trained models. ### Main Contributions 1. **Introduction of RARe**: A method adapted to pre - trained models is proposed, enabling them to utilize in - context examples in retrieval tasks. 2. **Performance Improvement of Multiple Basic Architectures**: It is shown that this method can be applied to various basic architectures (such as decoder - only language models and existing retrieval models) and achieves significant performance improvements in multiple tasks. 3. **Detailed Analysis**: A detailed analysis of the impact of the quality, quantity, and selection of in - context examples on performance is provided, explaining the sources of experimental gains. ### Method Overview - **Query Augmentation**: Through sparse retrieval techniques such as BM25, find in - context examples that are semantically similar to the target query and append them to the original query. - **Fine - Tuning**: Use the contrastive loss function to fine - tune the model so that the model can better utilize these in - context examples. ### Experimental Setup - **Benchmark Datasets**: Widely used retrieval benchmark datasets such as BeIR and RAR - b are used for evaluation. - **Baseline Models**: High - performance models including SFR - Embedding - 2 - R, LLM2Vec - Llama - 3 - 8B - Supervised, and E5 - Mistral - 7B - Instruct are included. - **Evaluation Metrics**: nDCG@10 is mainly used as an evaluation metric to measure the quality of retrieval results. ### Experimental Results - **Inference - Time Modification**: Directly adding in - context examples during inference leads to a performance decline. - **Training from LLM Checkpoints**: Starting training from LLM checkpoints, RARe achieves significant performance improvements on multiple benchmark datasets. Especially on the RAR - b benchmark, the absolute gain reaches + 2.72%. - **Continuing Training from Retriever Checkpoints**: Continuing to train existing retriever models, RARe also performs well on most tasks. Especially on out - of - domain tasks, the performance is improved by 1.95% compared to the instruction - only baseline method. ### Discussion and Analysis - **Selection of In - Context Examples**: Using retrieved in - context examples is more effective than randomly selected examples. - **Relevance of In - Context Examples**: When in - context examples are highly relevant to the target query, the performance improvement of RARe is most obvious. - **Quantity of In - Context Examples**: Increasing the number of in - context examples usually improves performance, but the optimal number may vary depending on the dataset. - **Efficiency Analysis**: Although adding in - context examples will increase the latency of the retrieval pipeline, in large - scale datasets, the impact of this latency is relatively small. In conclusion, through proposing the RARe method, this paper successfully solves the problem of how to use in - context examples in retrieval tasks to improve model performance and lays the foundation for future research.