Abstract:Recent studies have demonstrated the effectiveness of using large language language models (LLMs) in passage ranking. The listwise approaches, such as RankGPT, have become new state-of-the-art in this task. However, the efficiency of RankGPT models is limited by the maximum context length and relatively high latency of LLM inference. To address these issues, in this paper, we propose PE-Rank, leveraging the single passage embedding as a good context compression for efficient listwise passage reranking. By treating each passage as a special token, we can directly input passage embeddings into LLMs, thereby reducing input length. Additionally, we introduce an inference method that dynamically constrains the decoding space to these special tokens, accelerating the decoding process. For adapting the model to reranking, we employ listwise learning to rank loss for training. Evaluation results on multiple benchmarks demonstrate that PE-Rank significantly improves efficiency in both prefilling and decoding, while maintaining competitive ranking effectiveness. {The Code is available at \url{<a class="link-external link-https" href="https://github.com/liuqi6777/pe_rank" rel="external noopener nofollow">this https URL</a>}.}

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the efficiency problem encountered when using large - language models (LLMs) for paragraph re - ranking. Specifically, existing list - based methods such as RankGPT, although performing excellently in performance, are limited by the maximum context length and high inference latency, resulting in low efficiency in practical applications. To overcome these problems, the paper proposes the PE - Rank method, which achieves efficient list - style paragraph re - ranking by using single - paragraph embeddings as a good means of context compression. The main contributions of PE - Rank are: 1. **Proposing a novel and efficient list - style re - ranking method**: PE - Rank uses paragraph embeddings for context compression for the first time to improve the efficiency of the re - ranking task. 2. **Evaluation on multiple benchmark datasets**: Experimental results show that PE - Rank significantly improves inference efficiency while maintaining re - ranking performance comparable to uncompressed methods. 3. **Introducing a dynamic - constraint decoding strategy**: By dynamically adjusting the decoding space, the decoding process is accelerated, further improving efficiency. ### Paper Background Paragraph re - ranking is an important information retrieval and natural language processing task, aiming to rank each paragraph in a large corpus according to the user's query requirements. The current mainstream methods usually adopt the two - stage paradigm of "retrieval - re - ranking", that is, first efficiently retrieve a set of candidate paragraphs, and then further improve the ranking effect through a re - ranker. ### Existing Problems Although list - based methods such as RankGPT perform well in the re - ranking task, they are limited by the following two main challenges: - **Context length limitation**: The maximum context length of LLMs is limited, and it is unable to rank multiple paragraphs simultaneously, and techniques such as sliding windows need to be used to complete the ranking process. - **High inference cost**: Incorporating the entire paragraph into the prompt significantly increases the inference cost, resulting in high latency, which is unacceptable in ranking scenarios. ### Solutions To solve the above problems, PE - Rank proposes the following innovations: - **Paragraph embeddings as context compression**: Use paragraph embeddings generated by the dense retrieval model as compressed representations to replace the original text input into LLMs. - **Dynamic - constraint decoding strategy**: By dynamically adjusting the decoding space, decode only between specific special tokens, thereby accelerating the decoding process. - **Modal alignment**: Use a projector as a bridge to align the embedding space of the dense retrieval model and the input embedding space of LLMs. ### Experimental Results The evaluation results of PE - Rank on multiple benchmark datasets (such as TREC DL and BEIR) show that it significantly improves inference efficiency while maintaining re - ranking performance comparable to uncompressed methods. Especially when dealing with datasets with longer paragraphs (such as Covid), the efficiency advantage of PE - Rank is more obvious. ### Summary The PE - Rank method proposed in the paper effectively solves the efficiency bottleneck problem of existing list - based re - ranking methods by using paragraph embeddings and dynamic - constraint decoding strategies, providing a more efficient and practical solution for practical applications.

Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models

Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models

Re-Ranking Step by Step: Investigating Pre-Filtering for Re-Ranking with Large Language Models

Self-Calibrated Listwise Reranking with Large Language Models

FIRST: Faster Improved Listwise Reranking with Single Token Decoding

Incorporating Explicit Knowledge in Pre-trained Language Models for Passage Re-ranking

InstUPR : Instruction-based Unsupervised Passage Reranking with Large Language Models

PaRaDe: Passage Ranking using Demonstrations with Large Language Models

Reranking Passages with Coarse-to-Fine Neural Retriever Enhanced by List-Context Information

Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting

Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models

Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models

Hybrid and Collaborative Passage Reranking.

A Two-Stage Adaptation of Large Language Models for Text Ranking

Leveraging Passage-level Cumulative Gain for Document Ranking.

Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models

Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers

LLM-RankFusion: Mitigating Intrinsic Inconsistency in LLM-based Ranking

Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

EmbedLLM: Learning Compact Representations of Large Language Models