VVC+M: Plug and Play Scalable Image Coding for Humans and Machines

Alon Harell,Yalda Foroutan,Ivan V. Bajic
2023-05-17
Abstract:Compression for machines is an emerging field, where inputs are encoded while optimizing the performance of downstream automated analysis. In scalable coding for humans and machines, the compressed representation used for machines is further utilized to enable input reconstruction. Often performed by jointly optimizing the compression scheme for both machine task and human perception, this results in sub-optimal rate-distortion (RD) performance for the machine side. We focus on the case of images, proposing to utilize the pre-existing residual coding capabilities of video codecs such as VVC to create a scalable codec from any image compression for machines (ICM) scheme. Using our approach we improve an existing scalable codec to achieve superior RD performance on the machine task, while remaining competitive for human perception. Moreover, our approach can be trained post-hoc for any given ICM scheme, and without creating a coupling between the quality of the machine analysis and human vision.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address several key issues in the field of Compression for Machines (CM). Specifically: 1. **Improving Rate-Distortion Performance for Machine Tasks**: - Existing scalable codecs often consider human perceptual quality while optimizing for machine tasks, leading to suboptimal Rate-Distortion (RD) performance for machine tasks. This paper proposes a method to improve RD performance for machine tasks by separately optimizing the base layer. 2. **Achieving Scalable Encoding for Both Machines and Humans**: - By utilizing the residual coding mode in video codecs, any image compression scheme can be transformed into a scalable human-machine coding scheme. This method allows the base layer and enhancement layer to be optimized independently, avoiding the drawbacks of joint training. 3. **Enhancing Rate-Distortion Performance of the Enhancement Layer**: - Using existing efficient video codecs (such as VVC) as the enhancement layer, competitive RD performance in terms of human perception can be achieved while minimizing the training workload. In summary, this paper focuses on improving existing scalable codecs to achieve optimal performance for both machine tasks and human perception.