End-to-end Evaluation of Practical Video Analytics Systems for Face Detection and Recognition

Praneet Singh,Edward J. Delp,Amy R. Reibman
DOI: https://doi.org/10.2352/EI.2023.35.16.AVM-111
2023-10-11
Abstract:Practical video analytics systems that are deployed in bandwidth constrained environments like autonomous vehicles perform computer vision tasks such as face detection and recognition. In an end-to-end face analytics system, inputs are first compressed using popular video codecs like HEVC and then passed onto modules that perform face detection, alignment, and recognition sequentially. Typically, the modules of these systems are evaluated independently using task-specific imbalanced datasets that can misconstrue performance estimates. In this paper, we perform a thorough end-to-end evaluation of a face analytics system using a driving-specific dataset, which enables meaningful interpretations. We demonstrate how independent task evaluations, dataset imbalances, and inconsistent annotations can lead to incorrect system performance estimates. We propose strategies to create balanced evaluation subsets of our dataset and to make its annotations consistent across multiple analytics tasks and scenarios. We then evaluate the end-to-end system performance sequentially to account for task interdependencies. Our experiments show that our approach provides consistent, accurate, and interpretable estimates of the system's performance which is critical for real-world applications.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily addresses video analysis systems deployed in bandwidth-constrained environments (such as autonomous vehicles), specifically focusing on tasks related to face detection and recognition, and proposes a comprehensive end-to-end evaluation method. The paper points out that in previous research, the various modules of these systems (such as compression, detection, alignment, recognition, etc.) were often evaluated independently, neglecting the interdependencies between modules. Additionally, the existing datasets used for evaluation are usually unbalanced and annotated only for specific tasks, leading to inaccurate performance estimates. To address these issues, the authors designed a face analysis system tailored for driving scenarios and proposed a comprehensive evaluation scheme based on this system. This scheme includes: 1. **Creating balanced data subsets**: By using iterative strategies to create smaller but balanced data subsets, the evaluation time complexity is reduced, and biases caused by data imbalance are eliminated. 2. **Correcting dataset annotations**: Addressing inconsistencies in manual annotations under different camera modalities and lighting conditions to ensure consistency and accuracy of annotations under all conditions. 3. **Considering interdependencies between tasks**: Taking into account the mutual influences between tasks during evaluation, for example, the performance of face detection directly affects the subsequent face recognition task. 4. **Considering resource and bandwidth constraints**: Evaluating the impact of video compression on system performance and the effectiveness of lightweight models to adapt to resource-constrained real-world application scenarios. Through these improvements, the authors demonstrate how to obtain more accurate and interpretable system performance estimates, which are crucial for real-world applications. Experimental results show that adopting these strategies can significantly improve the quality of system evaluation, especially in cases of compressed video input and the use of lightweight models.