Scalable Image Coding for Humans and Machines Using Feature Fusion Network

Takahiro Shindo,Taiju Watanabe,Yui Tatsumi,Hiroshi Watanabe
2024-06-17
Abstract:As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate.
Computer Vision and Pattern Recognition,Multimedia
What problem does this paper attempt to address?
The paper aims to address the compatibility and generality issues between image compression methods for human vision and machine recognition models. Existing image compression methods, while meeting the needs of human vision and specific image recognition models to some extent, are usually optimized only for specific image recognition models and lack sufficient generality. These methods may encounter difficulties when required to support multiple different image recognition tasks. To solve these problems, the paper proposes a learning-based scalable image coding method that can be compatible with various image recognition models and efficiently decode images for human vision. Specifically, this method is achieved by combining a machine-oriented image compression model (using SA-ICM) and an additional information compression model. The features of these two models are fused in a feature fusion network to achieve efficient image compression and reconstruction. The main contributions of the paper include: 1. **Generality**: The proposed image coding method does not rely on specific image recognition models and is suitable for various image recognition tasks. 2. **Efficiency**: By using a feature fusion network, the number of parameters is reduced, and image compression efficiency is improved. 3. **Flexibility**: It can adapt to different image recognition models without modifying the image compression model on the device. Experimental validation shows that this method outperforms existing methods in terms of image compression performance, especially at low bit rates. Additionally, the paper explores how to further reduce the number of parameters of the additional information compression model by adjusting parameters, thereby reducing the computational burden.