Datactive: Data Fault Localization for Object Detection Systems

Yining Yin,Yang Feng,Shihao Weng,Yuan Yao,Jia Liu,Zhihong Zhao
DOI: https://doi.org/10.1145/3650212.3680329
2024-01-01
Abstract:Object detection (OD) models are seamlessly integrated into numerous intelligent software systems, playing a crucial role in various tasks. These models are typically constructed upon humanannotated datasets, whose quality can greatly affect their performance and reliability. Erroneous and inadequate annotated datasets can induce classification/localization inaccuracies during deployment, precipitating security breaches or traffic accidents that inflict property damage or even loss of life. Therefore, ensuring and improving data quality is a crucial issue for the reliability of the object detection system. This paper introduces Datactive, a data fault localization technique for object detection systems. Datactive is designed to locate various types of data faults including mislocalization and missing objects, without utilizing the prediction of object detection models trained on dirty datasets. To achieve this, we first construct foreground-only and background-included datasets via data disassembling strategies, and then employ a robust learning method to train classifiers using disassembled datasets. Based on the classifier predictions, Datactive produces a unified suspiciousness score for both foreground annotations and image backgrounds. It allows testers to easily identify and correct faulty or missing annotations with minimal effort. To validate the effectiveness, we conducted experiments on three datasets with 6 baselines, and demonstrated the superiority of Datactive from various aspects. We also explored Datactive's ability to find natural data faults and its application in both training and evaluation scenarios.
What problem does this paper attempt to address?