Dataset Preparation for Arbitrary Object Detection: an Automatic Approach Based on Web Information in English

Shucheng Li,Boyu Chang,Bo Yang,Hao Wu,Sheng Zhong,Fengyuan Xu
DOI: https://doi.org/10.1145/3539618.3591661
2023-01-01
Abstract:Automatic dataset preparation can help users avoid labor-intensive and costly manual data annotations. The difficulty in preparing a high-quality dataset for object detection involves three key aspects: relevance, naturality, and balance, which are not addressed by existing works. In this paper, we leverage information from the web, and propose a fully-automatic dataset preparation mechanism without any human annotation, which can automatically prepare a high-quality training dataset for the detection task with English text terms describing target objects. It contains three key designs, i.e., keyword expansion, data de-noising, and data balancing. Our experiments demonstrate that the object detectors trained with auto-prepared data are comparable to those trained with benchmark datasets and outperform other baselines. We also demonstrate the effectiveness of our approach in several more challenging real-world object categories that are not included in the benchmark datasets.
What problem does this paper attempt to address?