Abstract:In long context scenarios, large language models (LLMs) face three main challenges: higher computational cost, performance reduction, and position bias. Research indicates that LLM performance hinges on the density and position of key information in the input prompt. Inspired by these findings, we propose LongLLMLingua for prompt compression towards improving LLMs' perception of the key information to simultaneously address the three challenges. Our extensive evaluation across various long context scenarios demonstrates that LongLLMLingua not only enhances performance but also significantly reduces costs and latency. For instance, in the NaturalQuestions benchmark, LongLLMLingua boosts performance by up to 21.4% with around 4x fewer tokens in GPT-3.5-Turbo, leading to substantial cost savings. It achieves a 94.0% cost reduction in the LooGLE benchmark. Moreover, when compressing prompts of about 10k tokens at ratios of 2x-6x, LongLLMLingua can accelerate end-to-end latency by 1.4x-2.6x. Our code is available at <a class="link-external link-https" href="https://aka.ms/LongLLMLingua" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to address three main challenges faced when using large - language models (LLMs) in long - text scenarios: 1. **Higher computational costs**: This includes financial and latency costs. Long - text processing requires more computational resources, leading to increased costs. 2. **Performance degradation**: Irrelevant and redundant information contained in long texts can undermine the performance of LLMs. 3. **Position bias**: LLMs exhibit position bias when processing long texts, that is, the position of key information significantly affects the performance of the model, especially the information in the middle part of the text is easily ignored. To meet these challenges, the authors propose **LongLLMLingua**, which improves the LLMs' ability to perceive key information through prompt compression, thereby simultaneously solving the above three problems. Specifically, the main contributions of LongLLMLingua include: 1. **Proposing a problem - aware coarse - to - fine - grained compression method** to increase the density of key information in the prompt. 2. **Introducing a document re - ordering strategy** to minimize the position bias of LLMs. 3. **Establishing a dynamic compression ratio** to achieve precise control between coarse - grained and fine - grained compression. 4. **Proposing a post - compression sub - sequence recovery strategy** to improve the integrity of key information. 5. **Conducting extensive evaluations in multiple benchmark tests**, including NaturalQuestions, LongBench, ZeroSCROLLS, MuSicQue and LooGLE, covering various long - text scenarios. The experimental results show that the compressed prompts of LongLLMLingua are superior to the original prompts in terms of performance, cost - efficiency and system latency. Through these methods, LongLLMLingua not only improves the performance of LLMs in long - text scenarios, but also significantly reduces computational costs and latency.

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Adapting LLMs for Efficient Context Processing through Soft Prompt Compression

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

500xCompressor: Generalized Prompt Compression for Large Language Models

LanguaShrink: Reducing Token Overhead with Psycholinguistics

Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability

Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

SelfCP: Compressing Over-Limit Prompt via the Frozen Large Language Model Itself

Parse Trees Guided LLM Prompt Compression

Prompt Compression for Large Language Models: A Survey

Learning to Compress Prompt in Natural Language Formats

Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Efficient Prompting Methods for Large Language Models: A Survey

Network-aided Efficient Large Language Model Services With Denoising-inspired Prompt Compression

Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts

Extending Context Window of Large Language Models via Semantic Compression

Discrete Prompt Compression With Reinforcement Learning