Scalable image coding with enhancement features for human and machine

Ying Wu,Ping An,Chao Yang,XinPeng Huang
DOI: https://doi.org/10.1007/s00530-024-01279-y
IF: 3.9
2024-03-12
Multimedia Systems
Abstract:The past decade has seen significant advancements in computer vision technologies, resulting in an increasing consumption of images and videos by both human and machine. Although machines are usually the primary consumers, there are many applications where human involvement is indispensable. In this paper, we propose a novel image coding technique that targets machines while ensuring compatibility with human consumption. The proposed codec generates two distinct bitstreams: the reconstruction feature bitstreams and the enhancement feature bitstreams. The former are designed to facilitate image reconstruction for human consumption and vision tasks for machine consumption, while the latter are optimized for high-quality vision tasks. To achieve this goal, we introduce the Mask Multilayer Fusion Encoder (MMFE), which integrates multi-scale visual prior masks into partial channel features of the encoder. Additionally, due to the significant distortion of features at low bitrates, we propose a Local Feature Fusion Module (LFFM) that aggregates semantic information from the reconstruction features to obtain enhancement features, so as to improve the performance of vision tasks. Our experimental results demonstrate that our scalable codec provides significant bitrate savings of 26–77 on machine vision tasks compared to state-of-the-art image codecs, while maintaining comparable performance in terms of image reconstruction. Our proposed codec represents a significant advancement in the field of image coding, with the potential to improve both human and machine consumption of visual media.
computer science, information systems, theory & methods
What problem does this paper attempt to address?