V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

Jiaqi Wang,Yuhang Zang,Pan Zhang,Tao Chu,Yuhang Cao,Zeyi Sun,Ziyu Liu,Xiaoyi Dong,Tong Wu,Dahua Lin,Zeming Chen,Zhi Wang,Lingchen Meng,Wenhao Yao,Jianwei Yang,Sihong Wu,Zhineng Chen,Zuxuan Wu,Yu-Gang Jiang,Peixi Wu,Bosong Chai,Xuan Nie,Longquan Yan,Zeyu Wang,Qifan Zhou,Boning Wang,Jiaqi Huang,Zunnan Xu,Xiu Li,Kehong Yuan,Yanyan Zu,Jiayao Ha,Qiong Gao,Licheng Jiao
2024-06-18
Abstract:Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3Det Challenge 2024 in conjunction with the 4th Open World Vision Workshop: Visual Perception via Learning in an Open World (VPLOW) at CVPR 2024, Seattle, US. This challenge aims to push the boundaries of object detection research and encourage innovation in this field. The V3Det Challenge 2024 consists of two tracks: 1) Vast Vocabulary Object Detection: This track focuses on detecting objects from a large set of 13204 categories, testing the detection algorithm's ability to recognize and locate diverse objects. 2) Open Vocabulary Object Detection: This track goes a step further, requiring algorithms to detect objects from an open set of categories, including unknown objects. In the following sections, we will provide a comprehensive summary and analysis of the solutions submitted by participants. By analyzing the methods and solutions presented, we aim to inspire future research directions in vast vocabulary and open-vocabulary object detection, driving progress in this field. Challenge homepage: <a class="link-external link-https" href="https://v3det.openxlab.org.cn/challenge" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the challenges faced in object detection in real-world scenarios, particularly in handling vast vocabulary and open vocabulary situations. Specifically: 1. **Vast Vocabulary Object Detection**: - **Problem Background**: There are numerous types of objects in the real world, and existing object detection datasets usually contain a limited number of categories, which restricts the model's generalization ability in practical applications. - **Objective**: By designing a large-scale dataset (V3Det) containing 13,204 categories, the goal is to evaluate and advance the performance of object detection algorithms in handling a vast number of categories. 2. **Open Vocabulary Object Detection**: - **Problem Background**: In the real world, it is common to encounter previously unseen or unknown object categories, and existing object detection methods perform poorly in handling these unknown categories. - **Objective**: To develop object detection algorithms capable of recognizing and locating unknown categories, enhancing the model's robustness and generalization ability in open vocabulary scenarios. To achieve these objectives, the paper organized the V3Det Challenge 2024, which is divided into two tracks: - **Track 1**: Vast Vocabulary Object Detection, testing the model's detection capability on 13,204 categories. - **Track 2**: Open Vocabulary Object Detection, requiring the model to detect both known and unknown object categories. Through the competition in these two tracks, the paper aims to drive innovation and development in the field of object detection, providing direction and reference for future research.