Abstract:We propose a novel zero-shot document ranking approach based on Large Language Models (LLMs): the Setwise prompting approach. Our approach complements existing prompting approaches for LLM-based zero-shot ranking: Pointwise, Pairwise, and Listwise. Through the first-of-its-kind comparative evaluation within a consistent experimental framework and considering factors like model size, token consumption, latency, among others, we show that existing approaches are inherently characterised by trade-offs between effectiveness and efficiency. We find that while Pointwise approaches score high on efficiency, they suffer from poor effectiveness. Conversely, Pairwise approaches demonstrate superior effectiveness but incur high computational overhead. Our Setwise approach, instead, reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, compared to previous methods. This significantly improves the efficiency of LLM-based zero-shot ranking, while also retaining high zero-shot ranking effectiveness. We make our code and results publicly available at \url{<a class="link-external link-https" href="https://github.com/ielab/llm-rankers" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the trade - off between efficiency and effectiveness in zero - sample document ranking tasks for large language models (LLMs). Specifically, the paper focuses on the following points: 1. **Limitations of existing methods**: - **Pointwise methods**: They are efficient but less effective. - **Pairwise methods**: They are more effective but have high computational overhead and are low in efficiency. - **Listwise methods**: There is a compromise between efficiency and effectiveness, and they rely on generating the entire list of document labels, which is relatively slow in practical applications. 2. **Lack of fair comparison**: - In the existing literature, the effectiveness and efficiency of different LLM - based zero - sample ranking methods lack a fair comparison within a unified experimental framework. 3. **Proposing a new solution**: - The paper proposes a new set - based prompting approach (Setwise prompting approach) to improve the efficiency of zero - sample ranking while maintaining a high ranking effectiveness. ### Core idea of the solution - **Setwise Prompting**: By comparing multiple documents at once (instead of a pair of documents), the number of LLM inferences and prompt token consumption required are reduced. For example, in the heap sort algorithm, the traditional Pairwise method can only compare two documents at a time, while the Setwise method can compare multiple documents (such as 4) at once, thus significantly reducing the total number of comparisons. - **Ranking by combining Logits**: The Setwise method is applicable not only to Pairwise methods but also can improve Listwise methods. By using the logits output by the LLM to estimate the ranking possibility of document labels, the need to generate the entire list of document labels is avoided, thus improving efficiency. ### Experimental verification The paper proves the effectiveness and high efficiency of the Setwise method through extensive experiments, using the TREC Deep Learning 2019, 2020 and BEIR benchmark datasets, and testing under different LLM sizes. The experimental results show that the Setwise method performs excellently in the NDCG@10 metric, while significantly reducing the number of LLM inferences, input tokens and generated tokens, and reducing query latency. ### Main contributions 1. Proposing an innovative Setwise prompting method, which significantly improves the efficiency of zero - sample ranking while maintaining a high ranking effectiveness. 2. Systematically evaluating the existing LLM - based zero - sample ranking methods within a unified experimental framework, filling the gap in efficiency comparison in the literature. 3. Applying the Setwise method to Listwise methods, further enhancing their efficiency and effectiveness. Through these contributions, the paper provides valuable insights for choosing the most suitable LLM - based zero - sample ranking method for practical application scenarios.

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models

Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking

Zero-Shot Listwise Document Reranking with a Large Language Model

An Investigation of Prompt Variations for Zero-shot LLM-based Rankers

Large Language Models Are Zero-Shot Rankers for Recommender Systems

Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers

RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!

PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval

Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting

Large Language Models are Strong Zero-Shot Retriever

Make Large Language Model a Better Ranker

Zero-Shot Next-Item Recommendation using Large Pretrained Language Models

Zero-shot Generative Large Language Models for Systematic Review Screening Automation

Large Language Models are Zero-Shot Reasoners

TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy

Ranking of Large Language Model with Nonparametric Prompts

Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model

Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages

InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models

Self-Calibrated Listwise Reranking with Large Language Models

Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments