Abstract:Retrieved documents containing noise will hinder RAG from detecting answer clues and make the inference process slow and expensive. Therefore, context compression is necessary to enhance its accuracy and efficiency. Existing context compression methods use extractive or generative models to retain the most query-relevant sentences or apply the information bottleneck theory to preserve sufficient information. However, these methods may face issues such as over-compression or high computational costs. We observe that the retriever often ranks relevant documents at the top, but the exact number of documents needed to answer the query is uncertain due to the impact of query complexity and retrieval quality: complex queries like multi-hop questions may require retaining more documents than simpler queries, and a low-quality retrieval may need to rely on more documents to generate accurate outputs. Therefore, determining the minimum number of required documents (compression rate) is still a challenge for RAG. In this paper, we introduce AdaComp, a low-cost extractive context compression method that adaptively determines the compression rate based on both query complexity and retrieval quality. Specifically, we first annotate the minimum top-k documents necessary for the RAG system to answer the current query as the compression rate and then construct triplets of the query, retrieved documents, and its compression rate. Then, we use this triplet dataset to train a compression-rate predictor. Experiments on three QA datasets and one conversational Muiti-doc QA dataset show that AdaComp significantly reduces inference costs while maintaining performance nearly identical to uncompressed models, achieving a balance between efficiency and performance.

Effective In-Context Example Selection through Data Compression

In-Context Compositional Generalization for Large Vision-Language Models

Finding Support Examples for In-Context Learning

In-Context Demonstration Selection with Cross Entropy Difference

Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability

In-Context Former: Lightning-fast Compressing Context for Large Language Model

Context Compression for Auto-regressive Transformers with Sentinel Tokens

Adapting Language Models to Compress Contexts

Evaluating Large Language Models for Generalization and Robustness via Data Compression

Context Compression and Extraction: Efficiency Inference of Large Language Models

Going Beyond Word Matching: Syntax Improves In-context Example Selection for Machine Translation

Extending Context Window of Large Language Models via Semantic Compression

Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

How Do In-Context Examples Affect Compositional Generalization?

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Unifying Demonstration Selection and Compression for In-Context Learning

Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

Compositional Exemplars for In-context Learning

Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning

AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models