Abstract:A main goal in developing video-compression algorithms is to enhance human-perceived visual quality while maintaining file size. But modern video-analysis efforts such as detection and recognition, which are integral to video surveillance and autonomous vehicles, involve so much data that they necessitate machine-vision processing with minimal human intervention. In such cases, the video codec must be optimized for machine vision. This paper explores the effects of compression on detection and recognition algorithms (objects, faces, and license plates) and introduces novel full-reference image/video-quality metrics for each task, tailored to machine vision. Experimental results indicate our proposed metrics correlate better with the machine-vision results for the respective tasks than do existing image/video-quality metrics.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: during the video compression process, how to optimize image and video quality to ensure the performance of machine vision tasks (such as detection and recognition). Specifically, traditional video compression algorithms mainly focus on the visual quality perceived by humans and ignore the impact on automated video analysis systems (such as detection and recognition algorithms used in video surveillance and self - driving vehicles). Therefore, this paper aims to explore the impact of compression on detection and recognition algorithms (including object, face and license plate recognition), and propose new full - reference image/video quality assessment metrics that are specifically optimized for machine vision tasks. ### Main problems and solutions in the paper 1. **Problems**: - Existing video compression standards (such as JPEG, H.264/AVC, H.265/HEVC and AV1) mainly optimize the visual quality perceived by humans. - Automated video analysis systems (such as video surveillance and self - driving vehicles) rely on detection and recognition algorithms, and these algorithms are very sensitive to the compressed video quality. - Traditional human visual quality assessment methods (such as PSNR, SSIM and VMAF) have a low correlation with the performance of machine vision tasks. 2. **Solutions**: - **Propose a new method**: The author proposes a new method for measuring image and video quality, which is based on detection and recognition performance. - **Analyze existing methods**: The correlation between existing image quality and video quality assessment methods (such as PSNR, SSIM, VMAF, etc.) and detection and recognition performance is studied, and it is found that the correlation of these methods is low. - **Introduce new metrics**: New video quality assessment metrics based on the convolutional neural network (CNN) model are proposed. These metrics are respectively optimized for object detection, face recognition and license plate recognition tasks, and the high correlation between these metrics and the performance of machine vision algorithms for corresponding tasks is verified. ### Specific objectives 1. **High correlation**: The proposed quality assessment metrics should have a higher correlation with the specific implementations of the three main video analysis algorithms (object detection, face recognition, license plate recognition), while maintaining a low computational complexity. 2. **Generalization ability**: Consider the generalization ability of quality assessment metrics for different implementations of detection/recognition algorithms. 3. **Reduce error**: Due to the limitations of detection/recognition algorithms themselves, a more robust assessment method needs to be developed to reduce the error caused by the limitations of the algorithms themselves. Through these improvements, the paper aims to provide better tools for developers of video compression and video analysis systems to optimize compression parameters, thereby improving the performance of machine vision tasks.

Machine vision-aware quality metrics for compressed image and video assessment

Human Visual Perception Based Image Quality Assessment for Video Prediction

Video Coding for Machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics

Deep Image Compression Towards Machine Vision: A Unified Optimization Framework

Deep Image Compression Toward Machine Vision: A Unified Optimization Framework

Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics

Using video quality metrics for something other than compression

Machine Perception-Driven Image Compression: A Layered Generative Approach

Task-Driven Video Compression for Humans and Machines: Framework Design and Optimization

A Preprocessing Framework for Video Machine Vision under Compression

Encoder-Quantization-Motion-based Video Quality Metrics

An investigation of machine learning-based video compression techniques

Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling

Machine Perceptual Quality: Evaluating the Impact of Severe Lossy Compression on Audio and Image Models

Learned Image Compression for Machine Perception

AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

End-to-end Compression Towards Machine Vision: Network Architecture Design and Optimization

How to Assess the Quality of Compressed Surveillance Videos Using Face Recognition

Assessing objective quality metrics for JPEG and MPEG point cloud coding

Insights on Video Compression Strategies using Machine Learning

Fine-grained subjective visual quality assessment for high-fidelity compressed images