Abstract:In recent years, learned image compression methods have demonstrated superior rate-distortion performance compared to traditional image compression methods. Recent methods utilize convolutional neural networks (CNN), variational autoencoders (VAE), invertible neural networks (INN), and transformers. Despite their significant contributions, a main drawback of these models is their poor performance in capturing local redundancy. Therefore, to leverage global features along with local redundancy, we propose a CNN-based solution integrated with a feature encoding module. The feature encoding module encodes important features before feeding them to the CNN and then utilizes cross-scale window-based attention, which further captures local redundancy. Cross-scale window-based attention is inspired by the attention mechanism in transformers and effectively enlarges the receptive field. Both the feature encoding module and the cross-scale window-based attention module in our architecture are flexible and can be incorporated into any other network architecture. We evaluate our method on the Kodak and CLIC datasets and demonstrate that our approach is effective and on par with state-of-the-art methods.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is that the existing Learned Image Compression (LIC) methods perform poorly in capturing local redundancies. Although these methods are superior to traditional image compression methods in rate - distortion (RD) performance, they have limitations when dealing with local feature details. Specifically, CNN - based networks tend to capture high - level global features and perform poorly in learning fine local features. To overcome this problem, the author proposes a new method. By introducing a feature encoding module and a Cross - Scale Window - Based Attention (CWAM) mechanism, the processing ability of CNN for complex data representation is enhanced, and local redundancies are further captured. This method aims to combine global features with local redundancies, thereby improving the effect of image compression. ### Core improvement points of the paper: 1. **Feature Encoding Module**: This module encodes the features before inputting them into the CNN, enhancing the network's ability to capture important features in the image and reducing the information amount of simple parts. 2. **Cross - Scale Window - Based Attention (CWAM)**: Inspired by the attention mechanism in Transformer, CWAM effectively enlarges the receptive field, promotes the information interaction between different - scale windows, and improves the ability to capture local redundancies. ### Experimental Results: The author conducted experiments on the Kodak and CLIC datasets. The results show that the proposed method is comparable to the current state - of - the - art image compression methods in rate - distortion performance, and even performs better in some cases, especially showing stronger robustness when dealing with high - resolution images. ### Formula Summary: - Rate - distortion optimization objective function: \[ L = R(\hat{y})+\lambda D(x, \hat{x}) \] where \(R\) represents the bit rate of the latent variable \(\hat{y}\) and side information \(\hat{z}\), \(D\) represents the distortion measure, and \(\lambda\) is a hyper - parameter that controls the rate - distortion trade - off. - Probability distribution of latent variables: \[ p_{\hat{y}|\hat{z}}(\hat{y}|\hat{z})=\mathcal{N}(\mu, \sigma^{2}) \] Through these improvements, the method proposed in the paper not only improves the efficiency of image compression, but also significantly improves the quality of the reconstructed image while maintaining or reducing the bit rate.

Enhancing Learned Image Compression via Cross Window-based Attention

Efficient Lightweight Attention Based Learned Image Compression.

Improved deep learning image compression model: performance optimization based on convolutional modules and local attention mechanism

End-to-End Learnt Image Compression via Non-Local Attention Optimization and Improved Context Modeling

Learned image compression via neighborhood-based attention optimization and context modeling with multi-scale guiding

Neural Image Compression via Non-Local Attention Optimization and Improved Context Modeling

Learned Image Compression with Inception Residual Blocks and Multi-Scale Attention Module.

Enhancing High-Resolution Image Compression Through Local-Global Joint Attention Mechanism

Window-based Channel Attention for Wavelet-enhanced Learned Image Compression

Non-local Attention Optimized Deep Image Compression

Enhanced Invertible Encoding for Learned Image Compression

Learned Image Compression Using Cross-Component Attention Mechanism

End-to-End Image Compression Via Attention-Guided Information-Preserving Module

Image Compression using only Attention based Neural Networks

Learned Image Compression with Large Capacity and Low Redundancy of Latent Representation

Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression

Learned Image Compression With Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules

End-to-End Learned Scalable Multilayer Feature Compression for Machine Vision Tasks

Efficient Learned Image Compression with Selective Kernel Residual Module and Channel-Wise Causal Context Model.

Optimized Decoupled Structure with Non-Local Attention for Deep Image Compression

Region-of-interest and channel attention-based joint optimization of image compression and computer vision