Abstract:Learned image compression methods have shown superior rate-distortion performance and remarkable potential compared to traditional compression methods. Most existing learned approaches use stacked convolution or window-based self-attention for transform coding, which aggregate spatial information in a fixed range. In this paper, we focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform. With the adaptive aggregation strategy and the sharing weights mechanism, our method can achieve promising transform capability with acceptable model complexity. Besides, according to the recent progress of entropy model, we define a generalized coarse-to-fine entropy model, considering the coarse global context, the channel-wise, and the spatial context. Based on it, we introduce dynamic kernel in hyper-prior to generate more expressive global context. Furthermore, we propose an asymmetric spatial-channel entropy model according to the investigation of the spatial characteristics of the grouped latents. The asymmetric entropy model aims to reduce statistical redundancy while maintaining coding efficiency. Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.

What problem does this paper attempt to address?

The paper aims to address the rate-distortion performance issue in image compression, particularly by improving spatial aggregation capability and entropy models to surpass traditional image compression methods. Specifically: - **Research Background**: Traditional image compression methods (such as JPEG, BPG, and VVC) are effective but limited by fixed frameworks and lack of flexibility, making it difficult to adapt to different image content. In recent years, deep learning-based image compression methods (Learned Image Compression, LIC) have shown superior performance due to their end-to-end optimization approach. - **Main Issues**: Existing LIC methods based on CNN or window self-attention mechanisms have a fixed spatial information aggregation range during transform coding, and CNN weights are fixed and unchangeable, making it impossible to dynamically adjust according to content. Additionally, while entropy models can capture contextual information, they have limitations in parallel computing. - **Solution**: This paper proposes an adaptive spatial aggregation method based on dynamic kernels, achieving more flexible spatial information aggregation by generating content-related offsets and shared weights. Furthermore, a general coarse-to-fine entropy model is defined, and dynamic kernels are introduced to enhance global context expression capability. An asymmetric spatial-channel entropy model is also proposed to reduce statistical redundancy. - **Experimental Results**: Experiments show that this method outperforms current state-of-the-art learning-based methods in rate-distortion performance on three benchmark datasets while maintaining reasonable model complexity. In summary, by introducing dynamic kernels and improved entropy models, this paper aims to enhance the flexibility and efficiency of deep learning-based image compression techniques, thereby improving compression performance while keeping computational complexity manageable.

Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression

Coarse-to-Fine Hyper-Prior Modeling for Learned Image Compression

Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement

Learned Image Compression with Large Capacity and Low Redundancy of Latent Representation

Learning Content-Weighted Deep Image Compression

Learned Image Compression for Both Humans and Machines Via Dynamic Adaptation

Learning Image and Video Compression through Spatial-Temporal Energy Compaction

AFEC: Adaptive Feature Extraction Modules for Learned Image Compression

Learning-Based Scalable Image Compression With Latent-Feature Reuse and Prediction

Learned Image Compression with Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules

Learning-Based End-to-End Video Compression with Spatial-Temporal Adaptation.

Joint Global and Local Hierarchical Priors for Learned Image Compression

Learned Image Compression with Generalized Octave Convolution and Cross-Resolution Parameter Estimation

Spatially adaptive image compression using a tiled deep network

Learned Video Compression via Heterogeneous Deformable Compensation Network

Learned image compression via neighborhood-based attention optimization and context modeling with multi-scale guiding

Channel-wise Autoregressive Entropy Models for Learned Image Compression

Perceptual-oriented Learned Image Compression with Dynamic Kernel

Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Scaling, and Post-Quantization Filtering

Hybrid Model-based / Data-driven Graph Transform for Image Coding