Abstract:Image codecs are typically optimized to trade-off bitrate \vs distortion metrics. At low bitrates, this leads to compression artefacts which are easily perceptible, even when training with perceptual or adversarial losses. To improve image quality and remove dependency on the bitrate, we propose to decode with iterative diffusion models. We condition the decoding process on a vector-quantized image representation, as well as a global image description to provide additional context. We dub our model PerCo for 'perceptual compression', and compare it to state-of-the-art codecs at rates from 0.1 down to 0.003 bits per pixel. The latter rate is more than an order of magnitude smaller than those considered in most prior work, compressing a 512x768 Kodak image with less than 153 bytes. Despite this ultra-low bitrate, our approach maintains the ability to reconstruct realistic images. We find that our model leads to reconstructions with state-of-the-art visual quality as measured by FID and KID. As predicted by rate-distortion-perception theory, visual quality is less dependent on the bitrate than previous methods.

What problem does this paper attempt to address?

This paper attempts to solve the distortion problems that occur in image compression at very low bit rates, especially how to maintain the realism and visual quality of images at these very low bit rates. Traditional image codecs are usually optimized between bit rate and distortion, but at low bit rates, this will lead to obvious compression artifacts (such as blurring, blocking effects, etc.), and these problems cannot be completely avoided even when trained with perceptual or adversarial losses. To solve this problem, the authors propose a new image compression model - PerCo (Perceptual Compression). This model decodes through an iterative diffusion model and combines vector - quantized image representation and global image description to provide additional context information. Specifically, the main contributions of PerCo include: 1. **Developed a new diffusion model**: PerCo combines vector - quantized latent - space representation and text - image description for image compression. 2. **Achieved high - quality reconstruction at very low bit rates**: PerCo can generate realistic image reconstructions at a bit rate as low as 0.003 bits per pixel (bpp), significantly outperforming existing methods. 3. **Obtained state - of - the - art FID and KID performance**: On the MS - COCO 30k dataset, PerCo performs well at different bit rates, and the FID and KID metrics are relatively stable with the change of bit rate, which meets the goal of a perfect - realism codec. ### Formula Representation The formulas involved in the paper mainly include: - **Rate - distortion function**: \[ L_{RD} = E_{P_x} \left[ E_{P_{z|x}} \left[ L_R(z) + \lambda L_D(\hat{x}(z), x) \right] \right] \] where \( P_x \) is the data distribution, \( P_{z|x} \) is the posterior distribution of the quantization code, \( L_R(z) \) is the rate term, \( L_D(\hat{x}(z), x) \) is the distortion term, and \( \lambda \) is the weight parameter. - **Diffusion model loss**: \[ L_t^{\text{Diff}} = E_{P_x} E_{P_{z, x_t|x}} \left\| x_{t - 1} - \hat{x}_{t - 1}(x_t, z) \right\|_2^2 \] and its equivalent form: \[ L_t^{\text{Diff}} \propto E_{P_x} E_{P_{z, x_t|x}} E_{\epsilon \sim N(0, 1)} \left\| \epsilon - \epsilon_\theta(x_t, z, t) \right\|_2^2 \] - **Vector - quantization loss**: \[ L_{VQ} = E_{h_s} \left[ \left\| \text{sg}(h_s) - z_q \right\|_2^2 + \left\| \text{sg}(z_q) - h_s \right\|_2^2 \right] \] where \( \text{sg}(\cdot) \) represents the stop - gradient operation, and \( z_q \) is \( h_s \) mapped to the closest codebook entry. Through these improvements, PerCo can achieve high - quality image compression at very low bit rates while maintaining the realism and visual quality of the image.

Towards image compression with perfect realism at ultra-low bitrates

Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion

Enhancing the Rate-Distortion-Perception Flexibility of Learned Image Codecs with Conditional Diffusion Decoders

Lossy Image Compression with Foundation Diffusion Models

Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

Multi-Realism Image Compression with a Conditional Generator

Research on Application of Perceptual Model Based Image Compression

A Residual Diffusion Model for High Perceptual Quality Codec Augmentation

Learned Image Compression for Machine Perception

PerCo (SD): Open Perceptual Compression

End-to-end image compression method based on perception metric

Idempotence and Perceptual Image Compression

Optimally Controllable Perceptual Lossy Compression

Real-Time Adaptive Image Compression

Perceptually Optimizing Deep Image Compression

Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization

On Perceptual Lossy Compression: The Cost of Perceptual Reconstruction and An Optimal Training Framework

Visual Analysis Motivated Rate-Distortion Model for Image Coding

Feature-Preserving Rate-Distortion Optimization in Image Coding for Machines

Perceptual Quality-Oriented Rate Allocation via Distillation from End-to-End Image Compression

Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior