Abstract:Lossy compression algorithms are typically designed and analyzed through the lens of Shannon's rate-distortion theory, where the goal is to achieve the lowest possible distortion (e.g., low MSE or high SSIM) at any given bit rate. However, in recent years, it has become increasingly accepted that "low distortion" is not a synonym for "high perceptual quality", and in fact optimization of one often comes at the expense of the other. In light of this understanding, it is natural to seek for a generalization of rate-distortion theory which takes perceptual quality into account. In this paper, we adopt the mathematical definition of perceptual quality recently proposed by Blau & Michaeli (2018), and use it to study the three-way tradeoff between rate, distortion, and perception. We show that restricting the perceptual quality to be high, generally leads to an elevation of the rate-distortion curve, thus necessitating a sacrifice in either rate or distortion. We prove several fundamental properties of this triple-tradeoff, calculate it in closed form for a Bernoulli source, and illustrate it visually on a toy MNIST example.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in lossy compression algorithms, how to balance the relationship among compression rate, distortion, and perceptual quality. Traditionally, the design and analysis of lossy compression algorithms are mainly based on Shannon's Rate - Distortion Theory. Its goal is to achieve the lowest distortion (such as low Mean Squared Error (MSE) or high Structural Similarity Index (SSIM)) at a given bit rate. However, recent research shows that "low distortion" is not equivalent to "high perceptual quality", and optimizing one often comes at the expense of the other.
Therefore, this paper aims to extend the traditional Rate - Distortion Theory, introduce perceptual quality as the third key factor, and study the trade - off relationship among the three. Specifically, the paper adopts the mathematical definition of perceptual quality proposed by Blau & Michaeli (2018), explores the triple trade - off among compression rate, distortion, and perceptual quality, and proves several basic properties, such as:
1. **Trade - off relationship among the three**: When higher perceptual quality is required, it usually leads to an increase in the rate - distortion curve, that is, a compromise needs to be made in terms of compression rate or distortion.
2. **Theoretical properties**: The author proves several basic properties of the rate - distortion - perceptual function \(R(D, P)\), including monotonicity and convexity, and in some cases, the rate - distortion curve under perfect perceptual quality is necessarily higher than that without perceptual constraints.
3. **Experimental verification**: Through experiments on the MNIST data set, the trade - off relationship between perceptual quality and distortion at different bit rates is demonstrated. In particular, at low bit rates, optimizing perceptual quality can significantly improve the visual effect, although it may not be able to fully retain the information of the original image.
### Key Formulas
- Definition of the rate - distortion - perceptual function:
\[
R(D, P)=\min_{p_{\hat{X}|X}} I(X, \hat{X}) \quad \text{s.t.} \quad E[\Delta(X, \hat{X})] \leq D, \quad d(p_X, p_{\hat{X}}) \leq P
\]
where:
- \(I(X, \hat{X})\) represents mutual information.
- \(E[\Delta(X, \hat{X})]\) represents expected distortion.
- \(d(p_X, p_{\hat{X}})\) represents the perceptual quality indicator (such as total variation distance).
- Analytical solution for Bernoulli sources:
\[
R(D, P)=
\begin{cases}
H_b(p)-H_b(D) & D \in S_1 \\
2H_b(p)+H_b(p - P)-H_t\left(\frac{D - P}{2}, p\right)-H_t\left(\frac{D + P}{2}, q\right) & D \in S_2 \\
0 & D \in S_3
\end{cases}
\]
where:
- \(H_b(\alpha)\) is the entropy of a Bernoulli random variable.
- \(H_t(\alpha, \beta)\) is the entropy of a ternary random variable.
- \(S_1 = [0, D_1), S_2 = [D_1, D_2), S_3 = [D_2, \infty)\)
- \(D_1=\frac{P}{1 - 2(p - P)}, D_2 = 2pq-(q - p)P\)
### Summary
By introducing the new dimension of perceptual quality, this paper rethinks the rate - distortion theory in lossy compression, reveals the complex trade - off relationship among compression rate, distortion, and perceptual quality, and verifies the existence and importance of this relationship through theoretical analysis and experimental verification. This provides a new perspective and guidance for the design and evaluation of practical compression methods.