Image Coding for Machines with Omnipotent Feature Learning

Ruoyu Feng,Xin Jin,Zongyu Guo,Runsen Feng,Yixin Gao,Tianyu He,Zhizheng Zhang,Simeng Sun,Zhibo Chen

DOI: https://doi.org/10.48550/arXiv.2207.01932

2022-07-07

Abstract:Image Coding for Machines (ICM) aims to compress images for AI tasks analysis rather than meeting human perception. Learning a kind of feature that is both general (for AI tasks) and compact (for compression) is pivotal for its success. In this paper, we attempt to develop an ICM framework by learning universal features while also considering compression. We name such features as omnipotent features and the corresponding framework as Omni-ICM. Considering self-supervised learning (SSL) improves feature generalization, we integrate it with the compression task into the Omni-ICM framework to learn omnipotent features. However, it is non-trivial to coordinate semantics modeling in SSL and redundancy removing in compression, so we design a novel information filtering (IF) module between them by co-optimization of instance distinguishment and entropy minimization to adaptively drop information that is weakly related to AI tasks (e.g., some texture redundancy). Different from previous task-specific solutions, Omni-ICM could directly support AI tasks analysis based on the learned omnipotent features without joint training or extra transformation. Albeit simple and intuitive, Omni-ICM significantly outperforms existing traditional and learning-based codecs on multiple fundamental vision tasks.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to compress images efficiently in machine - intelligent analysis tasks while maintaining the semantic information that is important for machines. Existing image compression methods mainly focus on the optimization of human - perceived quality, such as by minimizing the mean - squared error (MSE) or maximizing the multi - scale structural similarity (MS - SSIM), and these methods perform poorly when supporting high - order machine - vision tasks (such as object detection, instance segmentation, etc.). This is because there are differences between the optimization indicators of human - perceived quality and those of machine - task performance (such as classification accuracy). To meet this challenge, the paper proposes a new framework named Omni - ICM, aiming to learn a feature that can be widely applied to different intelligent tasks and is also compact enough to be conducive to compression, namely the so - called "omnipotent features". Specifically, the paper achieves this goal by combining the self - supervised learning (SSL) and information filtering (IF) modules. The IF module is used to intelligently discard redundant information that is weakly related to AI tasks, thereby encouraging the learned representations to be sparser and more compact. In this way, not only can the compression efficiency be improved, but also the compressed features can be ensured to have good applicability for multiple downstream tasks. In summary, the core problem of this paper is to develop a new image - coding framework that can effectively compress images to support machine - intelligent analysis tasks while maintaining key semantic information and overcoming the limitations of existing compression methods in machine tasks.

Image Coding for Machines with Omnipotent Feature Learning

LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model

Image Coding for Machines with Object Region Learning

Bridging the gap between image coding for machines and humans

Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss

Unified and Scalable Deep Image Compression Framework for Human and Machine

Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs

Prompt-ICM: A Unified Framework towards Image Coding for Machines with Task-driven Prompts

Remote Sensing Image Coding for Machines on Semantic Segmentation via Contrastive Learning

Slimmable Multi-Task Image Compression for Human and Machine Vision

Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

Video Coding for Machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics

Image Coding for Machines with Edge Information Learning Using Segment Anything

Hierarchical Image Feature Compression for Machines via Feature Sparsity Learning

Deep Image Compression Toward Machine Vision: A Unified Optimization Framework

Deep Image Compression Towards Machine Vision: A Unified Optimization Framework

Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation

Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics

A Unified Active and Semi-Supervised Learning Framework for Image Compression

End-to-End Learned Scalable Multilayer Feature Compression for Machine Vision Tasks

A Unified Image Compression Method for Human Perception and Multiple Vision Tasks