Seeing the Invisible: Test Prioritization for Object Detection System

Shihao Weng,Yang Feng,Yining Yin,Yuxuan Dai,Jia Liu,Zhihong Zhao
DOI: https://doi.org/10.1007/s10664-024-10539-4
IF: 3.762
2024-01-01
Empirical Software Engineering
Abstract:Object detection models have been deployed in various safety-critical software systems. However, an inadequately tested object detection system may exhibit aberrant behavior in applications, potentially leading to immeasurable losses to users. The high cost of annotating object detection tasks creates an urgent need to test and ensure their accuracy and reliability. In recent years, many testing priority techniques for deep learning systems have been proposed, which to some extent alleviate the high cost of test case annotation. However, most of the current test prioritization methods cannot adapt to the complex characteristics of object detection tasks. Object detection systems need to detect all potential targets in a given image and classify them into correct categories. Both detection omissions and errors should be prioritized to enable testers to accurately label and analyze them, which poses additional challenges to the design of the prioritization method. In this paper, we expand our previous work and propose a new prioritization method named DeepView+. This method is designed for object detection systems at the instance-level, which assists testers in identifying both detection errors and omissions within these systems. For detection error, DeepView+ assigns a skepticism score to each predicted bounding box based on classification and localization capability. Moreover, DeepView+ overcomes the shortcomings of all existing prioritization methods that only focus on the prediction results, and introduces a novel algorithm to assign skepticism score for potential detection omission zones in each input. By aggregating the scores of two types of model error, DeepView+ is capable of identifying false positives and false negatives simultaneously. We extensively evaluate the superiority and diversity of DeepView+ through 27 experimental configurations. The experimental results further demonstrate the necessity of finding false negative detection omissions, as well as the outstanding effectiveness of DeepView+ in prioritizing detection omissions.
What problem does this paper attempt to address?