Abstract:The effectiveness of multi-stage text retrieval has been solidly demonstrated since before the era of pre-trained language models. However, most existing studies utilize models that predate recent advances in large language models (LLMs). This study seeks to explore potential improvements that state-of-the-art LLMs can bring. We conduct a comprehensive study, fine-tuning the latest LLaMA model both as a dense retriever (RepLLaMA) and as a pointwise reranker (RankLLaMA) for both passage retrieval and document retrieval using the MS MARCO datasets. Our findings demonstrate that the effectiveness of large language models indeed surpasses that of smaller models. Additionally, since LLMs can inherently handle longer contexts, they can represent entire documents holistically, obviating the need for traditional segmenting and pooling strategies. Furthermore, evaluations on BEIR demonstrate that our RepLLaMA-RankLLaMA pipeline exhibits strong zero-shot effectiveness. Model checkpoints from this study are available on HuggingFace.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to use the latest large - scale language models (LLMs) to improve the performance of multi - stage text retrieval systems**. Specifically, the authors explored fine - tuning the latest LLaMA model as a dense retriever (RepLLaMA) and a point - to - point re - ranker (RankLLaMA) to improve the effectiveness of paragraph retrieval and document retrieval. ### Main Problem Background 1. **Limitations of Existing Methods** - Most existing research uses earlier pre - trained language models, which fail to fully utilize the recent progress in large - scale language models (LLMs). - Traditional methods require splitting and pooling strategies when dealing with long documents, which may lead to information loss. - Existing zero - shot methods fail to fully explore the potential of LLMs. 2. **Research Motivation** - To explore whether the state - of - the - art LLMs can bring significant performance improvements to multi - stage text retrieval. - To verify whether LLMs can directly handle complete long documents, thus avoiding traditional splitting and pooling strategies. - To evaluate the performance of RepLLaMA and RankLLaMA on different datasets, especially their effectiveness in zero - shot settings. ### Research Objectives - **Verify the Superiority of LLMs**: Verify through experiments whether the LLaMA model performs better than smaller models in multi - stage text retrieval. - **Optimize the Multi - stage Retrieval Pipeline**: Fine - tune the LLaMA model so that it can better adapt to retrieval tasks and optimize the entire multi - stage retrieval pipeline. - **Explore Zero - shot Capability**: Evaluate the zero - shot performance of RepLLaMA and RankLLaMA on unseen datasets to demonstrate their generalization ability. ### Experimental Design and Results - **Datasets**: The MS MARCO and BEIR datasets were used for the experiments. - **Model Fine - tuning**: RepLLaMA and RankLLaMA were fine - tuned as retrievers and re - rankers respectively. - **Evaluation Metrics**: Standard metrics such as MRR@10 and nDCG@10 were used for evaluation. The experimental results show that RepLLaMA and RankLLaMA perform well in multiple benchmarks, and especially when dealing with long documents, the advantages of LLaMA are more obvious. In addition, zero - shot experiments also prove the strong generalization ability of these models. ### Conclusion By fine - tuning the LLaMA model, the authors successfully demonstrated the great potential of LLMs in multi - stage text retrieval, which not only improves the accuracy of retrieval and re - ranking, but also simplifies the process of long - document processing. This research shows that future multi - stage text retrieval systems can rely more on advanced LLMs to achieve higher efficiency and better performance.

Fine-Tuning LLaMA for Multi-Stage Text Retrieval

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

A Two-Stage Adaptation of Large Language Models for Text Ranking

R4: Reinforced Retriever-Reorder-Responder for Retrieval-Augmented Large Language Models

Are LLMs Effective Backbones for Fine-tuning? An Experimental Investigation of Supervised LLMs on Chinese Short Text Matching

Advancing Single- and Multi-task Text Classification through Large Language Model Fine-tuning

Leveraging LLMs for Unsupervised Dense Retriever Ranking

Query Rewriting for Retrieval-Augmented Large Language Models

Label Supervised LLaMA Finetuning

Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment

Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models

Fine-grained LLM Agent: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback

Enhancing Large Language Model Performance To Answer Questions and Extract Information More Accurately

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding

PMC-LLaMA: Further Finetuning LLaMA on Medical Papers

Bridging the Preference Gap between Retrievers and LLMs

ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

Self-Calibrated Listwise Reranking with Large Language Models

ScalingNote: Scaling up Retrievers with Large Language Models for Real-World Dense Retrieval