MFLFC:Multi-Frame Fusion Based Low-Resolution Feature Compression for Object Tracking

Yi Peng,Zixiang Zhang,Li Yu
DOI: https://doi.org/10.1109/icip51287.2024.10648094
2024-01-01
Abstract:In the realm of deep learning-driven computer vision, Video Coding for Machines (VCM) prioritizes machine vision optimization over human visual quality. This unique emphasis has resulted in the subdivision of VCM into two tracks, with one specifically targeting Feature Coding for Machines (FCM). While recent advancements in video feature compression show improvements, current FCM methods face challenges in eliminating spatial redundancy within intrafeatures and fully leveraging reference features for temporal redundancy removal. To address these issues, we propose a multi-frame fusion based low-resolution feature compression (MFLFC) method for object tracking. We introduce the concept of hierarchical coding for compressing video features, employing MFLFC to handle both P-features and B-features. Specifically, the proposed model incorporates a low-resolution layer to reduce spatial redundancy and a fusion layer to address temporal redundancy. Additionally, we have introduced a three-fold training strategy to improve training efficiency and stability, contributing to achieving superior performance. Experimental results verify that the proposed MFLFC significantly outperforms previous approaches by up to 82.21% BD-rate reduction.
What problem does this paper attempt to address?