Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models

Qi Liu,Bo Wang,Nan Wang,Jiaxin Mao
2024-06-21
Abstract:Recent studies have demonstrated the effectiveness of using large language language models (LLMs) in passage ranking. The listwise approaches, such as RankGPT, have become new state-of-the-art in this task. However, the efficiency of RankGPT models is limited by the maximum context length and relatively high latency of LLM inference. To address these issues, in this paper, we propose PE-Rank, leveraging the single passage embedding as a good context compression for efficient listwise passage reranking. By treating each passage as a special token, we can directly input passage embeddings into LLMs, thereby reducing input length. Additionally, we introduce an inference method that dynamically constrains the decoding space to these special tokens, accelerating the decoding process. For adapting the model to reranking, we employ listwise learning to rank loss for training. Evaluation results on multiple benchmarks demonstrate that PE-Rank significantly improves efficiency in both prefilling and decoding, while maintaining competitive ranking effectiveness. {The Code is available at \url{<a class="link-external link-https" href="https://github.com/liuqi6777/pe_rank" rel="external noopener nofollow">this https URL</a>}.}
Computation and Language,Information Retrieval
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the efficiency problem encountered when using large - language models (LLMs) for paragraph re - ranking. Specifically, existing list - based methods such as RankGPT, although performing excellently in performance, are limited by the maximum context length and high inference latency, resulting in low efficiency in practical applications. To overcome these problems, the paper proposes the PE - Rank method, which achieves efficient list - style paragraph re - ranking by using single - paragraph embeddings as a good means of context compression. The main contributions of PE - Rank are: 1. **Proposing a novel and efficient list - style re - ranking method**: PE - Rank uses paragraph embeddings for context compression for the first time to improve the efficiency of the re - ranking task. 2. **Evaluation on multiple benchmark datasets**: Experimental results show that PE - Rank significantly improves inference efficiency while maintaining re - ranking performance comparable to uncompressed methods. 3. **Introducing a dynamic - constraint decoding strategy**: By dynamically adjusting the decoding space, the decoding process is accelerated, further improving efficiency. ### Paper Background Paragraph re - ranking is an important information retrieval and natural language processing task, aiming to rank each paragraph in a large corpus according to the user's query requirements. The current mainstream methods usually adopt the two - stage paradigm of "retrieval - re - ranking", that is, first efficiently retrieve a set of candidate paragraphs, and then further improve the ranking effect through a re - ranker. ### Existing Problems Although list - based methods such as RankGPT perform well in the re - ranking task, they are limited by the following two main challenges: - **Context length limitation**: The maximum context length of LLMs is limited, and it is unable to rank multiple paragraphs simultaneously, and techniques such as sliding windows need to be used to complete the ranking process. - **High inference cost**: Incorporating the entire paragraph into the prompt significantly increases the inference cost, resulting in high latency, which is unacceptable in ranking scenarios. ### Solutions To solve the above problems, PE - Rank proposes the following innovations: - **Paragraph embeddings as context compression**: Use paragraph embeddings generated by the dense retrieval model as compressed representations to replace the original text input into LLMs. - **Dynamic - constraint decoding strategy**: By dynamically adjusting the decoding space, decode only between specific special tokens, thereby accelerating the decoding process. - **Modal alignment**: Use a projector as a bridge to align the embedding space of the dense retrieval model and the input embedding space of LLMs. ### Experimental Results The evaluation results of PE - Rank on multiple benchmark datasets (such as TREC DL and BEIR) show that it significantly improves inference efficiency while maintaining re - ranking performance comparable to uncompressed methods. Especially when dealing with datasets with longer paragraphs (such as Covid), the efficiency advantage of PE - Rank is more obvious. ### Summary The PE - Rank method proposed in the paper effectively solves the efficiency bottleneck problem of existing list - based re - ranking methods by using paragraph embeddings and dynamic - constraint decoding strategies, providing a more efficient and practical solution for practical applications.