AICT: An Adaptive Image Compression Transformer

Ahmed Ghorbel,Wassim Hamidouche,Luce Morin
2023-07-12
Abstract:Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT). Current methods that still rely on ConvNet-based entropy coding are limited in long-range modeling dependencies due to their local connectivity and an increasing number of architectural biases and priors. On the contrary, the proposed ICT can capture both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents. Further, we leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation while reconstructing higher-quality images. Extensive experimental results on benchmark datasets showed that the proposed adaptive image compression transformer (AICT) framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural codec SwinT-ChARM.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the efficiency of image compression while reducing the complexity of the decoder. Specifically, the paper proposes an Adaptive Image Compression Transformer (AICT), aiming to improve the existing entropy - coding methods based on Convolutional Networks (ConvNet). These methods have limitations in handling long - distance dependencies due to their local - connection characteristics. AICT extracts compact latent representations more accurately and reconstructs high - quality images by introducing a more effective Transformer - based channel autoregressive prior model and by using a learnable scaling module and ConvNeXt - based pre - /post - processors. Experimental results show that the AICT framework significantly outperforms the Versatile Video Coding (VVC) reference encoder (VTM - 18.0) and the neural codec SwinT - ChARM on multiple benchmark datasets, especially in terms of the trade - off between coding efficiency and decoder complexity. The key innovation points in the paper include: - **Proposing a new Image Compression Transformer (ICT)**: This non - linear transform - coding and spatial - channel autoregressive entropy - coding module, based on Swin Transformer blocks, can effectively reduce the correlation of latent variables and has a more flexible receptive field to adapt to contexts requiring short/long - distance information. - **Introducing the Adaptive Image Compression Transformer (AICT) model**: Using a scale - adaptation module as a sandwich processor to enhance compression efficiency. This module consists of a neural scaling network and ConvNeXt - based pre - /post - processors, which jointly optimize different differentiable adjustment layers and content - related adjustment factor estimators. - **Conducting extensive experimental verification**: Experiments were carried out on four widely - used benchmark datasets to explore possible sources of coding gain and to demonstrate the effectiveness of AICT. In addition, model - expansion analysis and ablation studies were also carried out to prove the rationality of the architectural decisions. These contributions enable AICT to achieve higher compression efficiency than existing methods while maintaining a lower decoding time, thus potentially helping with high - efficiency real - time visual data compression.