DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task

Wenhan Liu,Yutao Zhu,Zhicheng Dou
2024-09-25
Abstract:Recently, there has been increasing interest in applying large language models (LLMs) as zero-shot passage rankers. However, few studies have explored how to select appropriate in-context demonstrations for the passage ranking task, which is the focus of this paper. Previous studies mainly use LLM's feedback to train a retriever for demonstration selection. These studies apply the LLM to score each demonstration independently, which ignores the dependencies between demonstrations (especially important in ranking task), leading to inferior performance of top-$k$ retrieved demonstrations. To mitigate this issue, we introduce a demonstration reranker to rerank the retrieved demonstrations so that top-$k$ ranked ones are more suitable for ICL. However, generating training data for such reranker is quite challenging. On the one hand, different from demonstration retriever, the training samples of reranker need to incorporate demonstration dependencies. On the other hand, obtaining the gold ranking from the retrieved demonstrations is an NP-hard problem, which is hard to implement. To overcome these challenges, we propose a method to approximate the optimal demonstration list iteratively and utilize LLM to score demonstration lists of varying lengths. By doing so, the search space is greatly reduced and demonstration dependencies are considered. Based on these scored demonstration lists, we further design a list-pairwise training approach which compares a pair of lists that only differ in the last demonstration, to teach the reranker how to select the next demonstration given a previous sequence. In this paper, we propose a demonstration selection framework DemoRank for ranking task and conduct extensive experiments to prove its strong ability.
Information Retrieval,Computation and Language
What problem does this paper attempt to address?
### The Problem Addressed by the Paper The paper "DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task" aims to address the problem of selecting effective demonstrations for ranking tasks in large language models (LLMs). Specifically, the paper focuses on how to select appropriate demonstrations in In-Context Learning (ICL) to improve the performance of LLMs in ranking tasks. ### Background and Motivation In recent years, there has been an increasing amount of research applying large language models to zero-shot paragraph ranking tasks. However, few studies have explored how to select appropriate context demonstrations for ranking tasks. Existing methods mainly rely on feedback from LLMs to train a retriever for selecting demonstrations. These methods typically score each demonstration independently, ignoring the dependencies between demonstrations, which is particularly important in ranking tasks. As a result, these methods perform poorly when selecting the top k demonstrations. ### Main Contributions 1. **Proposing the DemoRank Framework**: This is the first comprehensive discussion on methods for effective demonstration selection in ranking tasks, proposing a framework consisting of a demonstration retriever (DRetriever) and a dependency-aware demonstration reranker (DReranker). 2. **Efficient Dependency-Aware Training Sample Construction Method**: An efficient method is proposed to construct dependency-aware training samples, considering the dependencies between demonstrations while being time-efficient. 3. **Listwise Pair Training Method**: Based on these training samples, a listwise pair training method is designed to teach the reranker how to select the next demonstration based on the previous sequence by comparing lists that differ only in the last demonstration. ### Method Overview 1. **Demonstration Pool Construction**: Construct a demonstration pool using queries and relevant/irrelevant paragraphs from the training set. 2. **Demonstration Retriever (DRetriever)**: - Use LLM to score candidate demonstrations and obtain supervision signals. - Train the retriever using a multi-task learning strategy, combining contrastive loss and ranking loss. 3. **Dependency-Aware Demonstration Reranker (DReranker)**: - Construct dependency-aware training samples through an iterative method, considering dependencies between demonstrations in each iteration. - Design a listwise pair training method to optimize the reranker's selection ability. 4. **Inference Stage**: - Use the trained DRetriever to retrieve demonstrations, then use the DReranker to sequentially select the highest-scoring demonstrations to construct the final ranking list. ### Experimental Results Experiments were conducted on multiple ranking datasets, including HotpotQA, NQ, FEVER, MS MARCO, etc. The results show that DemoRank outperforms baseline methods on all datasets, especially in few-shot ICL scenarios. ### Conclusion By introducing a dependency-aware demonstration selection method, DemoRank significantly improves the performance of large language models in ranking tasks. This framework provides new directions for future research, particularly in how to effectively utilize demonstrations for in-context learning.