Generative Visual Compression: A Review

Bolin Chen,Shanzhi Yin,Peilin Chen,Shiqi Wang,Yan Ye
2024-02-03
Abstract:Artificial Intelligence Generated Content (AIGC) is leading a new technical revolution for the acquisition of digital content and impelling the progress of visual compression towards competitive performance gains and diverse functionalities over traditional codecs. This paper provides a thorough review on the recent advances of generative visual compression, illustrating great potentials and promising applications in ultra-low bitrate communication, user-specified reconstruction/filtering, and intelligent machine analysis. In particular, we review the visual data compression methodologies with deep generative models, and summarize how compact representation and high-fidelity reconstruction could be actualized via generative techniques. In addition, we generalize related generative compression technologies for machine vision and intelligent analytics. Finally, we discuss the fundamental challenges on generative visual compression techniques and envision their future research directions.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is **how to use generative models (such as Variational Auto - Encoder (VAE), Generative Adversarial Network (GAN) and Diffusion Model (DM)) to achieve efficient visual data compression** in order to obtain high - quality visual reconstruction at the minimum encoding cost. Specifically, the paper mainly focuses on the following aspects: 1. **Improving compression efficiency**: Traditional image and video compression algorithms (such as H.264/AVC, H.265/HEVC and H.266/VVC) have bottlenecks when dealing with large - scale visual data. Generative models can learn more compact feature representations, thus achieving a higher compression ratio. 2. **Enhancing reconstruction quality**: Generative models can not only compress data, but also reconstruct high - quality images or videos from these compact features through powerful reasoning abilities. Especially in application scenarios such as ultra - low - bit - rate communication, user - specified reconstruction/filtering and intelligent machine analysis, generative models have shown great potential. 3. **Diversified functions**: Generative models can support multiple advanced functions, such as cross - modal encoding (encoding images into text), conceptual encoding (decomposing images into structural information and texture codes), temporal evolution encoding (using inter - frame motion information for compression) and full - dimensional data encoding (such as 3D point clouds and panoramas). 4. **Applications in machine vision**: In addition to human vision, generative models can also be used for machine vision tasks to ensure that the compressed visual data can still maintain high task performance. This includes two methods, pixel - domain analysis and feature - domain analysis, which are optimized for different types of machine tasks respectively. In summary, this paper aims to explore the latest progress of generative models in the field of visual compression and look forward to its future research directions, especially how to overcome the current challenges, such as the selection of evaluation metrics, the improvement of robustness and generalization ability, task - independent compression and communication design, and standardization and deployment.