Group Image Compression for Dual Use of Machine and Human Vision

Xin Fang,Xiaolin Wu,Fan Li,Yiping Duan,Xiaoming Tao
DOI: https://doi.org/10.1109/tcsvt.2024.3486558
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Faces in a scene of human group, if coded with sufficient precision, can be computer analyzed for machine vision tasks involving faces. But this requires storing and communicating them at a very high bit rate. Traditional ROI-based image compression methods are ill suited to code many faces at high precision against a complex background. In this work, we propose a novel group image compression neural network (GICNet) of two layers: 1) the face layer dedicated to machine analysis, in which face bounding boxes are first cropped out of the background and converted to a compression-friendly canonical sketch-guided representation of fixed resolution for compact coding and facilitating downstream tasks without additional preprocessing; 2) the background layer dedicated to overall human vision perceptual quality, in which face residuals and background elements are coded and appended to the code stream. Experimental results demonstrate the effectiveness of our proposed GICNet, conserving up to 13%-57% bitrate for machine vision applications while maintaining competitive perceptual quality.
What problem does this paper attempt to address?