QARV: Quantization-Aware ResNet VAE for Lossy Image Compression

Zhihao Duan,Ming Lu,Jack Ma,Yuning Huang,Zhan Ma,Fengqing Zhu
DOI: https://doi.org/10.1109/TPAMI.2023.3322904
2023-12-02
Abstract:This paper addresses the problem of lossy image compression, a fundamental problem in image processing and information theory that is involved in many real-world applications. We start by reviewing the framework of variational autoencoders (VAEs), a powerful class of generative probabilistic models that has a deep connection to lossy compression. Based on VAEs, we develop a novel scheme for lossy image compression, which we name quantization-aware ResNet VAE (QARV). Our method incorporates a hierarchical VAE architecture integrated with test-time quantization and quantization-aware training, without which efficient entropy coding would not be possible. In addition, we design the neural network architecture of QARV specifically for fast decoding and propose an adaptive normalization operation for variable-rate compression. Extensive experiments are conducted, and results show that QARV achieves variable-rate compression, high-speed decoding, and a better rate-distortion performance than existing baseline methods. The code of our method is publicly accessible at <a class="link-external link-https" href="https://github.com/duanzhiihao/lossy-vae" rel="external noopener nofollow">this https URL</a>
Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is lossy image compression, which is a fundamental problem in image processing and information theory and is widely used in many real - world scenarios. Specifically, the paper proposes a new lossy image compression scheme based on Variational Autoencoders (VAEs), called Quantization - Aware ResNet VAE (QARV). This method aims to improve the performance of lossy image compression through the following improvements: 1. **Quantization - Aware Training and Test - Time Quantization**: QARV combines test - time quantization and quantization - aware training, making efficient entropy coding possible. This solves the problem in traditional methods that existing entropy coding algorithms cannot be directly applied due to continuous - valued latent variables. 2. **Fast Decoding**: QARV designs a neural network architecture, especially for achieving fast decoding. The paper introduces a new block architecture that can transfer more computations from the decoder to the encoder, thus achieving a faster decoding speed than most previous image compression methods. 3. **Variable - Rate Compression**: The paper introduces a new variable - rate compression method - Adaptive Layer Normalization (AdaLN), which can be used in a plug - and - play manner in modern neural network architectures. This method allows QARV to achieve continuously adjustable compression rates while maintaining a single model. 4. **No Need for Context Models**: Unlike most existing methods, QARV avoids using spatial/channel autoregressive context models, which are not only complex in design but may also be computationally infeasible in practical applications. QARV achieves higher computational efficiency through its hierarchical VAE architecture while still achieving better compression performance than existing methods. In summary, the main objective of this paper is to provide a more efficient, more flexible, and more powerful lossy image compression method through QARV, with particular emphasis on the ability to achieve fast decoding and variable - rate compression while maintaining high - quality image reconstruction.