Abstract:In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. To reduce the decoding time resulting from the serial autoregressive context model, the parallel context model has been proposed as an alternative that necessitates only two passes during the decoding phase, thus facilitating efficient image compression in real-world scenarios. However, performance degradation occurs due to its incomplete casual context. To tackle this issue, we conduct an in-depth analysis of the performance degradation observed in existing parallel context models, focusing on two aspects: the Quantity and Quality of information utilized for context prediction and decoding. Based on such analysis, we propose the \textbf{Corner-to-Center transformer-based Context Model (C$^3$M)} designed to enhance context and latent predictions and improve rate-distortion performance. Specifically, we leverage the logarithmic-based prediction order to predict more context features from corner to center progressively. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder to capture the long-range semantic information by assigning the different window shapes in different channels. Extensive experimental evaluations show that the proposed method is effective and outperforms the state-of-the-art parallel methods. Finally, according to the subjective analysis, we suggest that improving the detailed representation in transformer-based image compression is a promising direction to be explored.

QUITO-X: A New Perspective on Context Compression from the Information Bottleneck Theory

QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression

Context Compression and Extraction: Efficiency Inference of Large Language Models

Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs

Fast Extraction of Word Embedding from Q-contexts

Perception Compressor:A training-free prompt compression method in long context scenarios

In-Context Former: Lightning-fast Compressing Context for Large Language Model

Compressing Lengthy Context With UltraGist

Efficient Large Multi-modal Models via Visual Context Compression

Context Compression for Auto-regressive Transformers with Sentinel Tokens

Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

Extending Context Window of Large Language Models via Semantic Compression

Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering

Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering

Corner-to-Center Long-range Context Model for Efficient Learned Image Compression

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

500xCompressor: Generalized Prompt Compression for Large Language Models

UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference