Abstract:Large Language Models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and tend to be lost in the middle in long context scenarios, leading to inferior performance. To address these challenges, we present Perception Compressor, a training-free prompt compression method. It includes a dual-slope ratio allocator to dynamically assign compression ratios and open-book ratios, a perception retriever that leverages guiding questions and instruction to retrieve the most relevant demonstrations, and a semi-guided iterative compression that retains key information at the token level while removing tokens that distract the LLM. We conduct extensive experiments on long context benchmarks, i.e., NaturalQuestions, LongBench, and MuSiQue. Experiment results show that Perception Compressor outperforms existing methods by a large margin, achieving state-of-the-art performance.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two main challenges that large language models (LLMs) encounter in long - context scenarios: 1. **Excessively long input sequences and redundant information**: In long - context scenarios, the input sequences are very long and contain a large amount of redundant information, which may lead to exceeding the window - size limits of LLMs. This not only increases the computational burden but may also degrade the model performance. 2. **Sensitivity to the position of key information**: LLMs are very sensitive to the position of key information. If the key information appears in the middle part, the performance of the model will decline significantly, which is the so - called "lost in the middle" phenomenon. To solve these problems, the paper proposes an untrained - prompt - compression method named **Perception Compressor**. This method is achieved through the following three main components: - **Dual - slope Ratio Allocator**: Dynamically allocate different compression ratios and open - book ratios to different components of the prompt (such as instructions, demonstrations, questions) to control the compression proportion and retain key information. - **Perception Retriever**: Use guiding questions and instructions to re - order the demonstration content, ensuring that the most relevant demonstration content is retained first, thereby increasing the density of key information. - **Semi - guided Iterative Compression**: Retain key - information tokens (KITs) during the compression process while removing non - key - information tokens (NITs) to reduce noise interference. Through these methods, the Perception Compressor can effectively reduce the input length, improve the performance of LLMs in long - context scenarios, and ensure that key information is not omitted or ignored. ### Formula summary - **Conditional perplexity calculation**: \[ r_{k,j}=L_{\text{ins}}+L_{q_j}+L_r\sum_{i = 1}^{X}g(\{\text{ins},q_j,r\}_i)\log p(\{\text{ins},q_j,r\}_i|d_k) \] where \(g(\cdot)\) represents the probability of the true distribution, and \(\{\text{ins},q_j,r\}_i\) is the \(i\)-th token after concatenation. - **Perceptual perplexity calculation**: \[ r_k=\sum_{i = 0}^{n}a_i\cdot r_{k,i} \] where \(a_i\) is the semantic similarity between \(q_0\) and \(q_i\). - **Conditional contrastive perplexity calculation**: \[ Q(s_{j,i})=g(s_{j,i})\log\frac{p(s_{j,i}|q_0,e_s < j,s_j < i)}{p(s_{j,i}|e_s < j,s_j < i)} \] - **Conditional perplexity calculation**: \[ P(s_{j,i})=g(s_{j,i})\log p(s_{j,i}|e_s < j,s_j < i) \] Through these formulas, the Perception Compressor can effectively evaluate and compress the prompt content, ensuring that key information is retained while redundant information is removed.

Perception Compressor:A training-free prompt compression method in long context scenarios

500xCompressor: Generalized Prompt Compression for Large Language Models

Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt

Prompt Compression for Large Language Models: A Survey

Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

From Reading to Compressing: Exploring the Multi-document Reader for Prompt Compression

SelfCP: Compressing Over-Limit Prompt via the Frozen Large Language Model Itself

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Discrete Prompt Compression With Reinforcement Learning

Learning to Compress Prompt in Natural Language Formats

Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs

Characterizing Prompt Compression Methods for Long Context Inference

Parse Trees Guided LLM Prompt Compression

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

LanguaShrink: Reducing Token Overhead with Psycholinguistics

Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression

TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning