Learned Image Compression with Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules

Haisheng Fu,Feng Liang,Jianping Lin,Bing Li,Mohammad Akbari,Jie Liang,Guohe Zhang,Dong Liu,Chengjie Tu,Jingning Han
2024-02-10
Abstract:Recently deep learning-based image compression methods have achieved significant achievements and gradually outperformed traditional approaches including the latest standard Versatile Video Coding (VVC) in both PSNR and MS-SSIM metrics. Two key components of learned image compression are the entropy model of the latent representations and the encoding/decoding network architectures. Various models have been proposed, such as autoregressive, softmax, logistic mixture, Gaussian mixture, and Laplacian. Existing schemes only use one of these models. However, due to the vast diversity of images, it is not optimal to use one model for all images, even different regions within one image. In this paper, we propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations, which can adapt to different contents in different images and different regions of one image more accurately and efficiently, given the same complexity. Besides, in the encoding/decoding network design part, we propose a concatenated residual blocks (CRB), where multiple residual blocks are serially connected with additional shortcut connections. The CRB can improve the learning ability of the network, which can further improve the compression performance. Experimental results using the Kodak, Tecnick-100 and Tecnick-40 datasets show that the proposed scheme outperforms all the leading learning-based methods and existing compression standards including VVC intra coding (4:4:4 and 4:2:0) in terms of the PSNR and MS-SSIM. The source code is available at \url{<a class="link-external link-https" href="https://github.com/fengyurenpingsheng" rel="external noopener nofollow">this https URL</a>}
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in image compression, although existing deep - learning - based methods have achieved remarkable achievements and gradually surpassed traditional compression methods (such as the latest standard Versatile Video Coding, VVC) in PSNR and MS - SSIM metrics, there are still two main problems: 1. **Singularity of entropy models**: Existing schemes usually use only one probability model (such as autoregressive model, Softmax, Logistic mixture model, Gaussian mixture model or Laplacian model) to represent the distribution of latent representations. However, due to the diversity of image content, using a single model to model all images or different regions of the same image is not the optimal choice. 2. **Spatial redundancy**: Even after using complex network structures, there is still a certain amount of spatial redundancy in the latent representation, which affects the compression performance. To address these problems, the authors propose the following solutions: - **A more flexible mixture model**: A discretized Gaussian - Laplacian - Logistic Mixture Model (GLLMM) is proposed, which can adapt to the content of different images and different regions of the same image more accurately and efficiently while maintaining the same complexity. - **An improved encoding/decoding network architecture**: The Concatenated Residual Blocks (CRB) are introduced. Through the serial connection of multiple residual blocks and additional shortcut connections, the learning ability of the network is improved, thereby further enhancing the compression performance. The experimental results show that the proposed scheme outperforms the existing leading learning methods and traditional compression standards (including the intra - coding of VVC) in PSNR and MS - SSIM metrics on Kodak, Tecnick - 100 and Tecnick - 40 datasets. Specifically, for the Kodak dataset, when the bit rate is higher than 0.4 bpp, this method is 0.2 - 0.3 dB higher than other methods; for the Tecnick dataset, when the bit rate is higher than 0.2 bpp, this method is 0.3 - 0.4 dB higher than VVC (4:4:4). These results represent the current state - of - the - art in learning - based image compression.