Interpretable Image/Video Compression by Extracting the Least Context Map

Huan Huang,Wei Yang
DOI: https://doi.org/10.1145/3603781.3603841
2023-01-01
Abstract:Current deep neural networks based image compression methods lack interpretability. Most of them follow a standard encoder-decoder framework and cannot be directly applied to video compression. We present a novel and interpretable finder-generator framework for image/video compression. The finder analyses the input image and selects important points on a one-channel binary map of the original width and height rather than compresses images into a multi-channel bitstream in a downsampled bottleneck layer. The binary one-channel map output by the finder retains the original width and height to keep the spatial information. We name it the least context map (LCM). The generator analyses the LCM to restore the original image based on its trained parameters. We put forward two different selection strategies for guiding the finder to extract the LCM. By extracting LCMs from images, our framework can reduce the size of real-world traffic surveillance videos by 96% compared to most common video codecs and by 85% compared to the next generation video compression codec VP9. This size reduction results from that adjacent frames always share similar LCMs and thus LCMs can be significantly compressed along the time axis. In addition, extensive experiments on Kodak dataset demonstrate our model surpasses the state-of-the-art image compression methods at low bit-rates. We only require an average compressed size of 2.01 kilobytes to achieve a high average MS-SSIM score of 0.9. This size is 50% smaller than JPEG, 43% smaller than FRRNN, and 11% smaller than WebP. Further comparative experiments on image generation demonstrate the LCM is superior to the semantic map and the edge map in higher information capacity and less required storage.
What problem does this paper attempt to address?