Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression

Huairui Wang,Nianxiang Fu,Zhenzhong Chen,Shan Liu
2023-08-17
Abstract:Learned image compression methods have shown superior rate-distortion performance and remarkable potential compared to traditional compression methods. Most existing learned approaches use stacked convolution or window-based self-attention for transform coding, which aggregate spatial information in a fixed range. In this paper, we focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform. With the adaptive aggregation strategy and the sharing weights mechanism, our method can achieve promising transform capability with acceptable model complexity. Besides, according to the recent progress of entropy model, we define a generalized coarse-to-fine entropy model, considering the coarse global context, the channel-wise, and the spatial context. Based on it, we introduce dynamic kernel in hyper-prior to generate more expressive global context. Furthermore, we propose an asymmetric spatial-channel entropy model according to the investigation of the spatial characteristics of the grouped latents. The asymmetric entropy model aims to reduce statistical redundancy while maintaining coding efficiency. Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
Image and Video Processing,Computer Vision and Pattern Recognition,Multimedia
What problem does this paper attempt to address?
The paper aims to address the rate-distortion performance issue in image compression, particularly by improving spatial aggregation capability and entropy models to surpass traditional image compression methods. Specifically: - **Research Background**: Traditional image compression methods (such as JPEG, BPG, and VVC) are effective but limited by fixed frameworks and lack of flexibility, making it difficult to adapt to different image content. In recent years, deep learning-based image compression methods (Learned Image Compression, LIC) have shown superior performance due to their end-to-end optimization approach. - **Main Issues**: Existing LIC methods based on CNN or window self-attention mechanisms have a fixed spatial information aggregation range during transform coding, and CNN weights are fixed and unchangeable, making it impossible to dynamically adjust according to content. Additionally, while entropy models can capture contextual information, they have limitations in parallel computing. - **Solution**: This paper proposes an adaptive spatial aggregation method based on dynamic kernels, achieving more flexible spatial information aggregation by generating content-related offsets and shared weights. Furthermore, a general coarse-to-fine entropy model is defined, and dynamic kernels are introduced to enhance global context expression capability. An asymmetric spatial-channel entropy model is also proposed to reduce statistical redundancy. - **Experimental Results**: Experiments show that this method outperforms current state-of-the-art learning-based methods in rate-distortion performance on three benchmark datasets while maintaining reasonable model complexity. In summary, by introducing dynamic kernels and improved entropy models, this paper aims to enhance the flexibility and efficiency of deep learning-based image compression techniques, thereby improving compression performance while keeping computational complexity manageable.