Abstract:Large Language Models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and tend to be lost in the middle in long context scenarios, leading to inferior performance. To address these challenges, we present Perception Compressor, a training-free prompt compression method. It includes a dual-slope ratio allocator to dynamically assign compression ratios and open-book ratios, a perception retriever that leverages guiding questions and instruction to retrieve the most relevant demonstrations, and a semi-guided iterative compression that retains key information at the token level while removing tokens that distract the LLM. We conduct extensive experiments on long context benchmarks, i.e., NaturalQuestions, LongBench, and MuSiQue. Experiment results show that Perception Compressor outperforms existing methods by a large margin, achieving state-of-the-art performance.
What problem does this paper attempt to address?
### What problems does this paper attempt to solve?
This paper aims to solve two main challenges that large language models (LLMs) encounter in long - context scenarios:
1. **Excessively long input sequences and redundant information**: In long - context scenarios, the input sequences are very long and contain a large amount of redundant information, which may lead to exceeding the window - size limits of LLMs. This not only increases the computational burden but may also degrade the model performance.
2. **Sensitivity to the position of key information**: LLMs are very sensitive to the position of key information. If the key information appears in the middle part, the performance of the model will decline significantly, which is the so - called "lost in the middle" phenomenon.
To solve these problems, the paper proposes an untrained - prompt - compression method named **Perception Compressor**. This method is achieved through the following three main components:
- **Dual - slope Ratio Allocator**: Dynamically allocate different compression ratios and open - book ratios to different components of the prompt (such as instructions, demonstrations, questions) to control the compression proportion and retain key information.
- **Perception Retriever**: Use guiding questions and instructions to re - order the demonstration content, ensuring that the most relevant demonstration content is retained first, thereby increasing the density of key information.
- **Semi - guided Iterative Compression**: Retain key - information tokens (KITs) during the compression process while removing non - key - information tokens (NITs) to reduce noise interference.
Through these methods, the Perception Compressor can effectively reduce the input length, improve the performance of LLMs in long - context scenarios, and ensure that key information is not omitted or ignored.
### Formula summary
- **Conditional perplexity calculation**:
\[
r_{k,j}=L_{\text{ins}}+L_{q_j}+L_r\sum_{i = 1}^{X}g(\{\text{ins},q_j,r\}_i)\log p(\{\text{ins},q_j,r\}_i|d_k)
\]
where \(g(\cdot)\) represents the probability of the true distribution, and \(\{\text{ins},q_j,r\}_i\) is the \(i\)-th token after concatenation.
- **Perceptual perplexity calculation**:
\[
r_k=\sum_{i = 0}^{n}a_i\cdot r_{k,i}
\]
where \(a_i\) is the semantic similarity between \(q_0\) and \(q_i\).
- **Conditional contrastive perplexity calculation**:
\[
Q(s_{j,i})=g(s_{j,i})\log\frac{p(s_{j,i}|q_0,e_s < j,s_j < i)}{p(s_{j,i}|e_s < j,s_j < i)}
\]
- **Conditional perplexity calculation**:
\[
P(s_{j,i})=g(s_{j,i})\log p(s_{j,i}|e_s < j,s_j < i)
\]
Through these formulas, the Perception Compressor can effectively evaluate and compress the prompt content, ensuring that key information is retained while redundant information is removed.