Abstract:In the Internet of Things (IoT) communications, visual data are frequently processed among intelligent devices using artificial intelligence algorithms, replacing humans for analysis and decision-making while only occasionally requiring human scrutiny. However, due to high redundancy of compressive encoders, existing image coding solutions for machine vision are inefficient at runtime. To balance the rate-accuracy performance and efficiency of image compression for machine vision while attaining high-quality reconstructed images for human vision, this paper introduces a novel slimmable multi-task compression framework for human and machine vision in visual IoT applications. Firstly, image compression for human and machine vision under the constraint of bandwidth, latency, and computational resources is modeled as a multi-task optimization problem. Secondly, slimmable encoders are employed for multiple human and machine vision tasks in which the parameters of the sub-encoder for machine vision tasks are shared among all tasks and jointly learned. Thirdly, to solve the feature match between latent representation and intermediate features of deep vision networks, feature transformation networks are introduced as decoders of machine vision feature compression. Finally, the proposed framework is successfully applied to human and machine vision tasks’ scenarios, e.g., object detection and image reconstruction. Experimental results show that the proposed method outperforms baselines and other image compression approaches on machine vision tasks with higher efficiency (shorter latency) in two vision tasks’ scenarios while retaining comparable quality on image reconstruction.

Reconstruction-free Image Compression for Machine Vision via Knowledge Transfer

Image reconstruction based on back propagation learning in Compressed Sensing theory

Learned Image Compression for Machine Perception

Learning-Based Image Compression for Machines

Analysis on Compressed Domain: A Multi-Task Learning Approach

Machine Perception-Driven Image Compression: A Layered Generative Approach

Unified Architecture Adaptation for Compressed Domain Semantic Inference

Image compression optimized for 3D reconstruction by utilizing deep neural networks

DNN-Compressed Domain Visual Recognition with Feature Adaptation

Towards On-demand Transmission: Joint Feature and Image Coding with Reversible Neural Networks

Learning from the NN-based Compressed Domain with Deep Feature Reconstruction Loss

End-to-end Compression Towards Machine Vision: Network Architecture Design and Optimization

Continual Cross-domain Image Compression Via Entropy Prior Guided Knowledge Distillation and Scalable Decoding

End-to-End Learned Scalable Multilayer Feature Compression for Machine Vision Tasks

Deep Image Compression Towards Machine Vision: A Unified Optimization Framework

Deep Image Compression Toward Machine Vision: A Unified Optimization Framework

Slimmable Multi-Task Image Compression for Human and Machine Vision

Compression with Vision Technologies

A Unified Image Compression Method for Human Perception and Multiple Vision Tasks

Preprocessing Enhanced Image Compression for Machine Vision

Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations