An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection

Pengfei Qi,Yifei Zhang,Wenqiang Li,Youwen Hu,Kunlong Bai
2024-09-10
Abstract:Detecting objects of interest through language often presents challenges, particularly with objects that are uncommon or complex to describe, due to perceptual discrepancies between automated models and human annotators. These challenges highlight the need for comprehensive datasets that go beyond standard object labels by incorporating detailed attribute descriptions. To address this need, we introduce the Objects365-Attr dataset, an extension of the existing Objects365 dataset, distinguished by its attribute annotations. This dataset reduces inconsistencies in object detection by integrating a broad spectrum of attributes, including color, material, state, texture and tone. It contains an extensive collection of 5.6M object-level attribute descriptions, meticulously annotated across 1.4M bounding boxes. Additionally, to validate the dataset's effectiveness, we conduct a rigorous evaluation of YOLO-World at different scales, measuring their detection performance and demonstrating the dataset's contribution to advancing object detection.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issues encountered in object detection through language prompts, especially for objects that are complex or uncommon to describe. Specifically, the paper proposes the following points: 1. **Limitations of the Dataset**: Existing datasets rely on standardized object vocabularies, which limit their adaptability to custom text queries. The main limitations of these datasets include: - Semantic Ambiguity: Short or partial object names may cause confusion, reducing the model's ability to distinguish between similar entities. - Insufficient Expression: Relying solely on object names for detection queries may fail to capture the complete descriptive information. 2. **Introduction of Attribute Annotations**: To overcome the above limitations, the paper proposes using attributes such as color, material, state, and texture as descriptive anchors. This approach has the following advantages: - Enhanced Context: Attributes can supplement missing contextual information, improving the completeness of descriptions for ambiguous categories. - Improved Interpretability: For unfamiliar categories, attributes can be mapped to known categories through pre-trained language models, facilitating understanding. - Detailed Representation: Attributes provide more detailed category descriptions, helping to characterize objects that are difficult to describe. 3. **Dataset and Automatic Annotation Pipeline**: Based on this, the researchers developed the Objects365-Attr dataset and designed an automatic annotation pipeline to optimize the annotation process. This dataset not only recognizes familiar objects but also enhances the expression of unfamiliar object characteristics through attribute descriptions. In summary, the paper attempts to improve the existing datasets in open vocabulary detection (OVD) and referring expression comprehension (REC) tasks by introducing detailed attribute descriptions, thereby enhancing the model's detection performance.