Abstract:Accurate real-time object detection is vital across numerous industrial applications, from safety monitoring to quality control. Traditional approaches, however, are hindered by arduous manual annotation and data collection, struggling to adapt to ever-changing environments and novel target objects. To address these limitations, this paper presents DART, an innovative automated end-to-end pipeline that revolutionizes object detection workflows from data collection to model evaluation. It eliminates the need for laborious human labeling and extensive data collection while achieving outstanding accuracy across diverse scenarios. DART encompasses four key stages: (1) Data Diversification using subject-driven image generation (DreamBooth with SDXL), (2) Annotation via open-vocabulary object detection (Grounding DINO) to generate bounding box and class labels, (3) Review of generated images and pseudo-labels by large multimodal models (InternVL-1.5 and GPT-4o) to guarantee credibility, and (4) Training of real-time object detectors (YOLOv8 and YOLOv10) using the verified data. We apply DART to a self-collected dataset of construction machines named Liebherr Product, which contains over 15K high-quality images across 23 categories. The current instantiation of DART significantly increases average precision (AP) from 0.064 to 0.832. Its modular design ensures easy exchangeability and extensibility, allowing for future algorithm upgrades, seamless integration of new object categories, and adaptability to customized environments without manual labeling and additional data collection. The code and dataset are released at <a class="link-external link-https" href="https://github.com/chen-xin-94/DART" rel="external noopener nofollow">this https URL</a>.

Efficiently Collecting Training Dataset for 2D Object Detection by Online Visual Feedback

Bounding Box Annotation with Visible Status

Training and Testing Object Detectors with Virtual Images

ClickBAIT-v2: Training an Object Detector in Real-Time

Feedback-driven object detection and iterative model improvement

Dataset Preparation for Arbitrary Object Detection: an Automatic Approach Based on Web Information in English

On-line object detection: a robotics challenge

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Automatically Prepare Training Data for YOLO Using Robotic In-Hand Observation and Synthesis

Incremental Training of a Detector Using Online Sparse Eigendecomposition

Multi-Label and Evolvable Dataset Preparation for Web-Based Object Detection

Automatic learning for object detection

Every Dataset Counts: Scaling up Monocular 3D Object Detection with Joint Datasets Training

Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting

DeepScanner: a Robotic System for Automated 2D Object Dataset Collection with Annotations

Using an LCD Monitor and a Robotic Arm to Quickly Establish Image Datasets for Object Detection

DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training

Automatic Data Augmentation from Massive Web Images for Deep Visual Recognition

Augment and Criticize: Exploring Informative Samples for Semi-Supervised Monocular 3D Object Detection

Automatic Dataset Augmentation.

DALDet: Depth-Aware Learning Based Object Detection for Autonomous Driving