Abstract:Recently, there has been increasing interest in applying large language models (LLMs) as zero-shot passage rankers. However, few studies have explored how to select appropriate in-context demonstrations for the passage ranking task, which is the focus of this paper. Previous studies mainly use LLM's feedback to train a retriever for demonstration selection. These studies apply the LLM to score each demonstration independently, which ignores the dependencies between demonstrations (especially important in ranking task), leading to inferior performance of top-$k$ retrieved demonstrations. To mitigate this issue, we introduce a demonstration reranker to rerank the retrieved demonstrations so that top-$k$ ranked ones are more suitable for ICL. However, generating training data for such reranker is quite challenging. On the one hand, different from demonstration retriever, the training samples of reranker need to incorporate demonstration dependencies. On the other hand, obtaining the gold ranking from the retrieved demonstrations is an NP-hard problem, which is hard to implement. To overcome these challenges, we propose a method to approximate the optimal demonstration list iteratively and utilize LLM to score demonstration lists of varying lengths. By doing so, the search space is greatly reduced and demonstration dependencies are considered. Based on these scored demonstration lists, we further design a list-pairwise training approach which compares a pair of lists that only differ in the last demonstration, to teach the reranker how to select the next demonstration given a previous sequence. In this paper, we propose a demonstration selection framework DemoRank for ranking task and conduct extensive experiments to prove its strong ability.

What problem does this paper attempt to address?

### The Problem Addressed by the Paper The paper "DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task" aims to address the problem of selecting effective demonstrations for ranking tasks in large language models (LLMs). Specifically, the paper focuses on how to select appropriate demonstrations in In-Context Learning (ICL) to improve the performance of LLMs in ranking tasks. ### Background and Motivation In recent years, there has been an increasing amount of research applying large language models to zero-shot paragraph ranking tasks. However, few studies have explored how to select appropriate context demonstrations for ranking tasks. Existing methods mainly rely on feedback from LLMs to train a retriever for selecting demonstrations. These methods typically score each demonstration independently, ignoring the dependencies between demonstrations, which is particularly important in ranking tasks. As a result, these methods perform poorly when selecting the top k demonstrations. ### Main Contributions 1. **Proposing the DemoRank Framework**: This is the first comprehensive discussion on methods for effective demonstration selection in ranking tasks, proposing a framework consisting of a demonstration retriever (DRetriever) and a dependency-aware demonstration reranker (DReranker). 2. **Efficient Dependency-Aware Training Sample Construction Method**: An efficient method is proposed to construct dependency-aware training samples, considering the dependencies between demonstrations while being time-efficient. 3. **Listwise Pair Training Method**: Based on these training samples, a listwise pair training method is designed to teach the reranker how to select the next demonstration based on the previous sequence by comparing lists that differ only in the last demonstration. ### Method Overview 1. **Demonstration Pool Construction**: Construct a demonstration pool using queries and relevant/irrelevant paragraphs from the training set. 2. **Demonstration Retriever (DRetriever)**: - Use LLM to score candidate demonstrations and obtain supervision signals. - Train the retriever using a multi-task learning strategy, combining contrastive loss and ranking loss. 3. **Dependency-Aware Demonstration Reranker (DReranker)**: - Construct dependency-aware training samples through an iterative method, considering dependencies between demonstrations in each iteration. - Design a listwise pair training method to optimize the reranker's selection ability. 4. **Inference Stage**: - Use the trained DRetriever to retrieve demonstrations, then use the DReranker to sequentially select the highest-scoring demonstrations to construct the final ranking list. ### Experimental Results Experiments were conducted on multiple ranking datasets, including HotpotQA, NQ, FEVER, MS MARCO, etc. The results show that DemoRank outperforms baseline methods on all datasets, especially in few-shot ICL scenarios. ### Conclusion By introducing a dependency-aware demonstration selection method, DemoRank significantly improves the performance of large language models in ranking tasks. This framework provides new directions for future research, particularly in how to effectively utilize demonstrations for in-context learning.

DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task

PaRaDe: Passage Ranking using Demonstrations with Large Language Models

Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning

Revisiting Demonstration Selection Strategies in In-Context Learning

Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

Unified Demonstration Retriever for In-Context Learning

Demonstration Notebook: Finding the Most Suited In-Context Learning Example from Interactions

Demonstration Selection for In-Context Learning via Reinforcement Learning

Comparable Demonstrations Are Important in In-Context Learning: A Novel Perspective on Demonstration Selection

Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers

DRUM: Learning Demonstration Retriever for Large MUlti-modal Models

Misconfidence-based Demonstration Selection for LLM In-Context Learning

Self-Calibrated Listwise Reranking with Large Language Models

RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation

The Whole is Better than the Sum: Using Aggregated Demonstrations in In-Context Learning for Sequential Recommendation

In-Context Learning Demonstration Selection via Influence Analysis

Not All Demonstration Examples Are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning

Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning

In-Context Demonstration Selection with Cross Entropy Difference

Curriculum Demonstration Selection for In-Context Learning

Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process