Image Compression with Encoder-Decoder Matched Semantic Segmentation

Trinh Man Hoang,Jinjia Zhou,Yibo Fan
DOI: https://doi.org/10.1109/CVPRW50498.2020.00088
2021-01-30
Abstract:In recent years, layered image compression is demonstrated to be a promising direction, which encodes a compact representation of the input image and apply an up-sampling network to reconstruct the image. To further improve the quality of the reconstructed image, some works transmit the semantic segment together with the compressed image data. Consequently, the compression ratio is also decreased because extra bits are required for transmitting the semantic segment. To solve this problem, we propose a new layered image compression framework with encoder-decoder matched semantic segmentation (EDMS). And then, followed by the semantic segmentation, a special convolution neural network is used to enhance the inaccurate semantic segment. As a result, the accurate semantic segment can be obtained in the decoder without requiring extra bits. The experimental results show that the proposed EDMS framework can get up to 35.31% BD-rate reduction over the HEVC-based (BPG) codec, 5% bitrate, and 24% encoding time saving compare to the state-of-the-art semantic-based image codec.
Image and Video Processing,Computer Vision and Pattern Recognition,Multimedia
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the quality of image compression without increasing the transmission of additional bits. Specifically, the existing semantic - based image compression methods enhance the quality of compressed images by transmitting semantic segmentation information, but this method requires additional bits to store and transmit these semantic information, thus reducing the compression ratio. To solve this problem, the paper proposes a new hierarchical image compression framework - Encoder - Decoder Matched Semantic Segmentation (EDMS). ### Main contributions: 1. **Avoid additional bit transmission**: By applying the same semantic segmentation network in the encoder and decoder respectively, the same semantic segmentation result can be obtained at the decoder end as at the encoder end without the need for additional transmission of semantic information. 2. **Improve the quality of reconstructed images**: In order to compensate for the inaccuracy of the semantic segmentation information extracted from the up - sampled images, the paper introduces a special convolutional neural network (CNN) to enhance the accuracy of semantic segmentation. 3. **Performance improvement**: The experimental results show that the proposed EDMS framework outperforms the existing traditional image compression methods and the state - of - the - art semantic - based image compression methods in terms of peak signal - to - noise ratio (PSNR) and multi - scale structural similarity (MS - SSIM) metrics. ### Technical details: - **Semantic segmentation network**: Apply the semantic segmentation network in the encoder and decoder respectively to ensure that the same semantic segmentation results are obtained at both ends. - **Semantic segmentation enhancement**: Use a special Recursive Residual Network (SMapNet) to enhance the semantic segmentation information extracted from the up - sampled images, making it closer to the semantic segmentation result of the original image. - **Overall framework**: It includes three main networks: CompNet (for down - sampling), FineNet (for image synthesis) and SMapNet (for semantic segmentation enhancement). ### Experimental results: - **Performance comparison**: On the ADE20K test set, the average PSNR of the EDMS framework in the range of 1.1 - 2.1 bpp is 2.5 dB higher than that of BPG. - **Processing speed**: Compared with DSSLIC, the EDMS framework reduces the bit rate by 5% at the same PSNR, and the encoding time is reduced by 24%. - **Generalization ability**: The test results on the Kodak data set show that the EDMS framework also performs well on images with different distributions, and the average PSNR in the range of 1.3 - 2.3 bpp is 2.4 dB higher than that of BPG. In conclusion, this paper proposes an innovative image compression method, which not only improves the quality of compressed images, but also avoids the transmission of additional bits, thus achieving significant improvements in compression ratio and processing speed.