An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection

Pengfei Qi,Yifei Zhang,Wenqiang Li,Youwen Hu,Kunlong Bai

2024-09-10

Abstract:Detecting objects of interest through language often presents challenges, particularly with objects that are uncommon or complex to describe, due to perceptual discrepancies between automated models and human annotators. These challenges highlight the need for comprehensive datasets that go beyond standard object labels by incorporating detailed attribute descriptions. To address this need, we introduce the Objects365-Attr dataset, an extension of the existing Objects365 dataset, distinguished by its attribute annotations. This dataset reduces inconsistencies in object detection by integrating a broad spectrum of attributes, including color, material, state, texture and tone. It contains an extensive collection of 5.6M object-level attribute descriptions, meticulously annotated across 1.4M bounding boxes. Additionally, to validate the dataset's effectiveness, we conduct a rigorous evaluation of YOLO-World at different scales, measuring their detection performance and demonstrating the dataset's contribution to advancing object detection.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the issues encountered in object detection through language prompts, especially for objects that are complex or uncommon to describe. Specifically, the paper proposes the following points: 1. **Limitations of the Dataset**: Existing datasets rely on standardized object vocabularies, which limit their adaptability to custom text queries. The main limitations of these datasets include: - Semantic Ambiguity: Short or partial object names may cause confusion, reducing the model's ability to distinguish between similar entities. - Insufficient Expression: Relying solely on object names for detection queries may fail to capture the complete descriptive information. 2. **Introduction of Attribute Annotations**: To overcome the above limitations, the paper proposes using attributes such as color, material, state, and texture as descriptive anchors. This approach has the following advantages: - Enhanced Context: Attributes can supplement missing contextual information, improving the completeness of descriptions for ambiguous categories. - Improved Interpretability: For unfamiliar categories, attributes can be mapped to known categories through pre-trained language models, facilitating understanding. - Detailed Representation: Attributes provide more detailed category descriptions, helping to characterize objects that are difficult to describe. 3. **Dataset and Automatic Annotation Pipeline**: Based on this, the researchers developed the Objects365-Attr dataset and designed an automatic annotation pipeline to optimize the annotation process. This dataset not only recognizes familiar objects but also enhances the expression of unfamiliar object characteristics through attribute descriptions. In summary, the paper attempts to improve the existing datasets in open vocabulary detection (OVD) and referring expression comprehension (REC) tasks by introducing detailed attribute descriptions, thereby enhancing the model's detection performance.

An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection

Dataset Preparation for Arbitrary Object Detection: an Automatic Approach Based on Web Information in English

Attributed object detection based on natural language processing

Anno-incomplete Multi-dataset Detection

DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection

Structural Analysis of Attributes for Vehicle Re-Identification and Retrieval

TJU-DHD: A Diverse High-Resolution Dataset for Object Detection

A Richly Annotated Dataset for Pedestrian Attribute Recognition

Towards RAW Object Detection in Diverse Conditions

3D Object Detection on Large-Scale Dataset

Learning to Predict Visual Attributes in the Wild

A Large-Scale Car Parts (LSCP) Dataset for Lightweight Fine-Grained Detection

Open-vocabulary Attribute Detection

Towards Large-Scale Small Object Detection: Survey and Benchmarks.

On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes

Improving Annotation for 3D Pose Dataset of Fine-Grained Object Categories

Object Detectors in the Open Environment: Challenges, Solutions, and Outlook

OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection

DL-YOLOX: Real-time object detection via adjustable dilated enhancement for autonomous driving scene

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving