Towards image compression with perfect realism at ultra-low bitrates

Marlène Careil,Matthew J. Muckley,Jakob Verbeek,Stéphane Lathuilière
2024-03-19
Abstract:Image codecs are typically optimized to trade-off bitrate \vs distortion metrics. At low bitrates, this leads to compression artefacts which are easily perceptible, even when training with perceptual or adversarial losses. To improve image quality and remove dependency on the bitrate, we propose to decode with iterative diffusion models. We condition the decoding process on a vector-quantized image representation, as well as a global image description to provide additional context. We dub our model PerCo for 'perceptual compression', and compare it to state-of-the-art codecs at rates from 0.1 down to 0.003 bits per pixel. The latter rate is more than an order of magnitude smaller than those considered in most prior work, compressing a 512x768 Kodak image with less than 153 bytes. Despite this ultra-low bitrate, our approach maintains the ability to reconstruct realistic images. We find that our model leads to reconstructions with state-of-the-art visual quality as measured by FID and KID. As predicted by rate-distortion-perception theory, visual quality is less dependent on the bitrate than previous methods.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
This paper attempts to solve the distortion problems that occur in image compression at very low bit rates, especially how to maintain the realism and visual quality of images at these very low bit rates. Traditional image codecs are usually optimized between bit rate and distortion, but at low bit rates, this will lead to obvious compression artifacts (such as blurring, blocking effects, etc.), and these problems cannot be completely avoided even when trained with perceptual or adversarial losses. To solve this problem, the authors propose a new image compression model - PerCo (Perceptual Compression). This model decodes through an iterative diffusion model and combines vector - quantized image representation and global image description to provide additional context information. Specifically, the main contributions of PerCo include: 1. **Developed a new diffusion model**: PerCo combines vector - quantized latent - space representation and text - image description for image compression. 2. **Achieved high - quality reconstruction at very low bit rates**: PerCo can generate realistic image reconstructions at a bit rate as low as 0.003 bits per pixel (bpp), significantly outperforming existing methods. 3. **Obtained state - of - the - art FID and KID performance**: On the MS - COCO 30k dataset, PerCo performs well at different bit rates, and the FID and KID metrics are relatively stable with the change of bit rate, which meets the goal of a perfect - realism codec. ### Formula Representation The formulas involved in the paper mainly include: - **Rate - distortion function**: \[ L_{RD} = E_{P_x} \left[ E_{P_{z|x}} \left[ L_R(z) + \lambda L_D(\hat{x}(z), x) \right] \right] \] where \( P_x \) is the data distribution, \( P_{z|x} \) is the posterior distribution of the quantization code, \( L_R(z) \) is the rate term, \( L_D(\hat{x}(z), x) \) is the distortion term, and \( \lambda \) is the weight parameter. - **Diffusion model loss**: \[ L_t^{\text{Diff}} = E_{P_x} E_{P_{z, x_t|x}} \left\| x_{t - 1} - \hat{x}_{t - 1}(x_t, z) \right\|_2^2 \] and its equivalent form: \[ L_t^{\text{Diff}} \propto E_{P_x} E_{P_{z, x_t|x}} E_{\epsilon \sim N(0, 1)} \left\| \epsilon - \epsilon_\theta(x_t, z, t) \right\|_2^2 \] - **Vector - quantization loss**: \[ L_{VQ} = E_{h_s} \left[ \left\| \text{sg}(h_s) - z_q \right\|_2^2 + \left\| \text{sg}(z_q) - h_s \right\|_2^2 \right] \] where \( \text{sg}(\cdot) \) represents the stop - gradient operation, and \( z_q \) is \( h_s \) mapped to the closest codebook entry. Through these improvements, PerCo can achieve high - quality image compression at very low bit rates while maintaining the realism and visual quality of the image.