Scalable Image Coding for Human and Machines: Based on Partial Channel Context Model

Yunhui Shi,Jiawei Ren,Lilong Wang,Jin Wang,Jiale Liu
DOI: https://doi.org/10.1109/ccdc62350.2024.10587429
2024-01-01
Abstract:In recent years, there has been a substantial increase in the amount of visual data generated by edge devices. Machines typically process this data to accomplish tasks such as object detection without human visual judgment. However, human viewing is sometimes required during human-robot interaction. Here, there exists a significant difference in the focus of information between humans and machines. To tackle this issue, we propose an end-to-end learning-based image coding framework, aiming to strike a balance between human and machine vision tasks. Also, a portion of the latent space is used for both machine vision and human vision. This is different from a compression framework that only targets human vision. Because of this difference, correlations still exist between tasks. So we propose a partial-channel context model to improve coding performance.Our scalable coding framework achieves simultaneous support for both human and machine vision by partitioning the latent space. Machine vision tasks are handled by a subset of the latent space, referred to as the base layer. More complex human visual reconstruction tasks are accomplished by an additional subset of the latent space, comprising both base and enhancement layers. In the experimental section, we present the performance of human visual reconstruction and machine vision tasks, comparing them with other benchmarks. The experiments demonstrate that our framework achieves a 28.27%-38.16% reduction in bitrate for machine vision tasks and matches the performance of state-of-the-art image codecs in terms of input reconstruction.
What problem does this paper attempt to address?