Abstract:High Recall Retrieval (HRR), such as eDiscovery and medical systematic review, is a search problem that optimizes the cost of retrieving most relevant documents in a given collection. Iterative approaches, such as iterative relevance feedback and uncertainty sampling, are shown to be effective under various operational scenarios. Despite neural models demonstrating success in other text-related tasks, linear models such as logistic regression, in general, are still more effective and efficient in HRR since the model is trained and retrieves documents from the same fixed collection. In this work, we leverage SPLADE, an efficient retrieval model that transforms documents into contextualized sparse vectors, for HRR. Our approach combines the best of both worlds, leveraging both the contextualization from pretrained language models and the efficiency of linear models. It reduces 10% and 18% of the review cost in two HRR evaluation collections under a one-phase review workflow with a target recall of 80%. The experiment is implemented with TARexp and is available at

What problem does this paper attempt to address?

The paper primarily explores how to leverage the advantages of Pretrained Language Models (PLMs) to improve retrieval efficiency and effectiveness in High Recall Retrieval (HRR) tasks. Specifically, the authors propose a method that uses SPLADE (a highly efficient sparse retrieval model) to convert documents into context-sensitive sparse vectors, and then input these vectors as features into a linear model for classification. The core contributions of the paper include: 1. Proposing an effective sparse classification model for HRR tasks, which combines the contextual understanding capabilities of pretrained language models with the efficiency of linear models. 2. Conducting comprehensive experiments under two different workflows (single-stage and two-stage) to test the effectiveness of the proposed method. 3. Performing ablation studies to analyze the impact of different pretrained language model choices on the final results. Experimental results show that on two HRR evaluation sets (RCV1-v2 and Jeb Bush), combining context features generated by SPLADE with traditional BM25 features can significantly reduce the total review cost compared to the baseline BM25 model, with a maximum reduction of about 27%. Additionally, the proposed combined method also demonstrates good performance across categories of varying difficulty and generality. Overall, this paper aims to improve existing HRR techniques by combining the powerful expressive capabilities of pretrained language models with the efficiency of linear models, particularly in application scenarios requiring high recall rates, such as legal document review.

Contextualization with SPLADE for High Recall Retrieval

CoSPLADE: Contextualizing SPLADE for Conversational Information Retrieval

Two-Step SPLADE: Simple, Efficient and Effective Approximation of SPLADE

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

Mistral-SPLADE: LLMs for better Learned Sparse Retrieval

Context Tuning for Retrieval Augmented Generation

SPLATE: Sparse Late Interaction Retrieval

A Thorough Comparison of Cross-Encoders and LLMs for Reranking SPLADE

ILCR: Item-based Latent Factors for Sparse Collaborative Retrieval

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control

In-Context Retrieval-Augmented Language Models

Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction

FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference

RRescue: Ranking LLM Responses to Enhance Reasoning over Context

Multi-Reranker: Maximizing performance of retrieval-augmented generation in the FinanceRAG challenge

A Unified Framework for Learned Sparse Retrieval

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning

Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks

Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP