Competitive Learning for Achieving Content-specific Filters in Video Coding for Machines

Honglei Zhang,Jukka I. Ahonen,Nam Le,Ruiying Yang,Francesco Cricri
2024-06-18
Abstract:This paper investigates the efficacy of jointly optimizing content-specific post-processing filters to adapt a human oriented video/image codec into a codec suitable for machine vision tasks. By observing that artifacts produced by video/image codecs are content-dependent, we propose a novel training strategy based on competitive learning principles. This strategy assigns training samples to filters dynamically, in a fuzzy manner, which further optimizes the winning filter on the given sample. Inspired by simulated annealing optimization techniques, we employ a softmax function with a temperature variable as the weight allocation function to mitigate the effects of random initialization. Our evaluation, conducted on a system utilizing multiple post-processing filters within a Versatile Video Coding (VVC) codec framework, demonstrates the superiority of content-specific filters trained with our proposed strategies, specifically, when images are processed in blocks. Using VVC reference software VTM 12.0 as the anchor, experiments on the OpenImages dataset show an improvement in the BD-rate reduction from -41.3% and -44.6% to -42.3% and -44.7% for object detection and instance segmentation tasks, respectively, compared to independently trained filters. The statistics of the filter usage align with our hypothesis and underscore the importance of jointly optimizing filters for both content and reconstruction quality. Our findings pave the way for further improving the performance of video/image codecs.
Computer Vision and Pattern Recognition,Machine Learning,Multimedia
What problem does this paper attempt to address?
The paper aims to address the performance optimization of video encoding in machine vision tasks. Specifically, researchers have found that the artifacts produced by traditional video/image codecs are not only related to the compression ratio but also closely tied to the content of the input data. To adapt to data with different content, the researchers propose a new training strategy based on the principle of competitive learning to jointly optimize multiple content-specific post-processing filters. This method dynamically allocates training samples to different filters and uses simulated annealing techniques to mitigate the effects of random initialization, thereby improving the system's performance in machine vision tasks. Experimental results show that these jointly trained filters perform better than independently trained filters when processing blocky images, especially in object detection and instance segmentation tasks. Additionally, the usage statistics of the filters also confirm their dual optimization effect on content and reconstruction quality.