Abstract:Although the recent learning-based image and video coding techniques achieve rapid development, the signal fidelity-driven target in these methods leads to the divergence to a highly effective and efficient coding framework for both human and machine. In this paper, we aim to address the issue by making use of the power of generative models to bridge the gap between full fidelity (for human vision) and high discrimination (for machine vision). Therefore, relying on existing pretrained generative adversarial networks (GAN), we build a GAN inversion framework that projects the image into a low-dimensional natural image manifold. In this manifold, the feature is highly discriminative and also encodes the appearance information of the image, named as <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">latent code</i> . Taking a variational bit-rate constraint with a hyperprior model to model/suppress the entropy of image manifold code, our method is capable of fulfilling the needs of both machine and human visions at very low bit-rates. To improve the visual quality of image reconstruction, we further propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">multiple latent codes</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">scalable inversion</i> . The former gets several latent codes in the inversion, while the latter additionally compresses and transmits a shallow compact feature to support visual reconstruction. Experimental results demonstrate the superiority of our method in both human vision tasks, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e</i> . image reconstruction, and machine vision tasks, including semantic parsing and attribute prediction.

Group Image Compression for Dual Use of Machine and Human Vision

An Efficient Compressive Convolutional Network for Unified Object Detection and Image Compression

Joint super-resolution-based fast face image coding for human and machine vision

Facial Image Compression via Neural Image Manifold Compression

Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision

Scalable Face Image Coding via StyleGAN Prior: Toward Compression for Human-Machine Collaborative Vision

Preprocessing Enhanced Image Compression for Machine Vision

A Deep Image Compression Framework for Face Recognition

Machine Perception-Driven Image Compression: A Layered Generative Approach

Region-of-interest and channel attention-based joint optimization of image compression and computer vision

Deep Image Compression Towards Machine Vision: A Unified Optimization Framework

Deep Image Compression Toward Machine Vision: A Unified Optimization Framework

A Unified Image Compression Method for Human Perception and Multiple Vision Tasks

2C-Net: integrate image compression and classification via deep neural network

Progressive Deep Image Compression for Hybrid Contexts of Image Classification and Reconstruction

Towards Analysis-Friendly Face Representation with Scalable Feature and Texture Compression

GFSCompNet: remote sensing image compression network based on global feature-assisted segmentation

Learning based Facial Image Compression with semantic fidelity metric

Content-aware Facial Image Compression with Deep Learning Method

Unified and Scalable Deep Image Compression Framework for Human and Machine

Slimmable Multi-Task Image Compression for Human and Machine Vision