Scalable Image Coding for Humans and Machines Using Feature Fusion Network

Takahiro Shindo,Taiju Watanabe,Yui Tatsumi,Hiroshi Watanabe

2024-06-17

Abstract:As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate.

Computer Vision and Pattern Recognition,Multimedia

What problem does this paper attempt to address?

The paper aims to address the compatibility and generality issues between image compression methods for human vision and machine recognition models. Existing image compression methods, while meeting the needs of human vision and specific image recognition models to some extent, are usually optimized only for specific image recognition models and lack sufficient generality. These methods may encounter difficulties when required to support multiple different image recognition tasks. To solve these problems, the paper proposes a learning-based scalable image coding method that can be compatible with various image recognition models and efficiently decode images for human vision. Specifically, this method is achieved by combining a machine-oriented image compression model (using SA-ICM) and an additional information compression model. The features of these two models are fused in a feature fusion network to achieve efficient image compression and reconstruction. The main contributions of the paper include: 1. **Generality**: The proposed image coding method does not rely on specific image recognition models and is suitable for various image recognition tasks. 2. **Efficiency**: By using a feature fusion network, the number of parameters is reduced, and image compression efficiency is improved. 3. **Flexibility**: It can adapt to different image recognition models without modifying the image compression model on the device. Experimental validation shows that this method outperforms existing methods in terms of image compression performance, especially at low bit rates. Additionally, the paper explores how to further reduce the number of parameters of the additional information compression model by adjusting parameters, thereby reducing the computational burden.

Scalable Image Coding for Humans and Machines Using Feature Fusion Network

Scalable image coding with enhancement features for human and machine

Image Coding for Machines with Object Region Learning

Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach

Towards Coding for Human and Machine Vision: A Scalable Image Coding Approach

Refining Coded Image in Human Vision Layer Using CNN-Based Post-Processing

Towards Coding for Human and Machine Vision: Scalable Face Image Coding

Learned Image Coding for Human-Machine Collaborative Optimization

Collaborative Scalable Visual Compression for Human-Centered Videos.

Learning-Based Scalable Image Compression With Latent-Feature Reuse and Prediction

VVC+M: Plug and Play Scalable Image Coding for Humans and Machines

Learned Image Coding for Machines: A Content-Adaptive Approach

End-to-End Learned Scalable Multilayer Feature Compression for Machine Vision Tasks

Unified and Scalable Deep Image Compression Framework for Human and Machine

Semantically Scalable Image Coding With Compression Of Feature Maps

Towards Efficient Learned Image Coding for Machines Via Saliency-Driven Rate Allocation.

Towards On-demand Transmission: Joint Feature and Image Coding with Reversible Neural Networks

Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss

Learned Scalable Video Coding For Humans and Machines

Image Coding for Machines based on Non-Uniform Importance Allocation.

Image Coding for Machines with Edge Information Learning Using Segment Anything