Machine vision-aware quality metrics for compressed image and video assessment

Mikhail Dremin,Konstantin Kozhemyakov,Ivan Molodetskikh,Malakhov Kirill,Artur Sagitov,Dmitriy Vatolin
2024-11-11
Abstract:A main goal in developing video-compression algorithms is to enhance human-perceived visual quality while maintaining file size. But modern video-analysis efforts such as detection and recognition, which are integral to video surveillance and autonomous vehicles, involve so much data that they necessitate machine-vision processing with minimal human intervention. In such cases, the video codec must be optimized for machine vision. This paper explores the effects of compression on detection and recognition algorithms (objects, faces, and license plates) and introduces novel full-reference image/video-quality metrics for each task, tailored to machine vision. Experimental results indicate our proposed metrics correlate better with the machine-vision results for the respective tasks than do existing image/video-quality metrics.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: during the video compression process, how to optimize image and video quality to ensure the performance of machine vision tasks (such as detection and recognition). Specifically, traditional video compression algorithms mainly focus on the visual quality perceived by humans and ignore the impact on automated video analysis systems (such as detection and recognition algorithms used in video surveillance and self - driving vehicles). Therefore, this paper aims to explore the impact of compression on detection and recognition algorithms (including object, face and license plate recognition), and propose new full - reference image/video quality assessment metrics that are specifically optimized for machine vision tasks. ### Main problems and solutions in the paper 1. **Problems**: - Existing video compression standards (such as JPEG, H.264/AVC, H.265/HEVC and AV1) mainly optimize the visual quality perceived by humans. - Automated video analysis systems (such as video surveillance and self - driving vehicles) rely on detection and recognition algorithms, and these algorithms are very sensitive to the compressed video quality. - Traditional human visual quality assessment methods (such as PSNR, SSIM and VMAF) have a low correlation with the performance of machine vision tasks. 2. **Solutions**: - **Propose a new method**: The author proposes a new method for measuring image and video quality, which is based on detection and recognition performance. - **Analyze existing methods**: The correlation between existing image quality and video quality assessment methods (such as PSNR, SSIM, VMAF, etc.) and detection and recognition performance is studied, and it is found that the correlation of these methods is low. - **Introduce new metrics**: New video quality assessment metrics based on the convolutional neural network (CNN) model are proposed. These metrics are respectively optimized for object detection, face recognition and license plate recognition tasks, and the high correlation between these metrics and the performance of machine vision algorithms for corresponding tasks is verified. ### Specific objectives 1. **High correlation**: The proposed quality assessment metrics should have a higher correlation with the specific implementations of the three main video analysis algorithms (object detection, face recognition, license plate recognition), while maintaining a low computational complexity. 2. **Generalization ability**: Consider the generalization ability of quality assessment metrics for different implementations of detection/recognition algorithms. 3. **Reduce error**: Due to the limitations of detection/recognition algorithms themselves, a more robust assessment method needs to be developed to reduce the error caused by the limitations of the algorithms themselves. Through these improvements, the paper aims to provide better tools for developers of video compression and video analysis systems to optimize compression parameters, thereby improving the performance of machine vision tasks.