UN-DETR: Promoting Objectness Learning via Joint Supervision for Unknown Object Detection

Haomiao Liu,Hao Xu,Chuhuai Yue,Bo Ma
2024-12-13
Abstract:Unknown Object Detection (UOD) aims to identify objects of unseen categories, differing from the traditional detection paradigm limited by the closed-world assumption. A key component of UOD is learning a generalized representation, i.e. objectness for both known and unknown categories to distinguish and localize objects from the background in a class-agnostic manner. However, previous methods obtain supervision signals for learning objectness in isolation from either localization or classification information, leading to poor performance for UOD. To address this issue, we propose a transformer-based UOD framework, UN-DETR. Based on this, we craft Instance Presence Score (IPS) to represent the probability of an object's presence. For the purpose of information complementarity, IPS employs a strategy of joint supervised learning, integrating attributes representing general objectness from the positional and the categorical latent space as supervision signals. To enhance IPS learning, we introduce a one-to-many assignment strategy to incorporate more supervision. Then, we propose Unbiased Query Selection to provide premium initial query vectors for the decoder. Additionally, we propose an IPS-guided post-process strategy to filter redundant boxes and correct classification predictions for known and unknown objects. Finally, we pretrain the entire UN-DETR in an unsupervised manner, in order to obtain objectness prior. Our UN-DETR is comprehensively evaluated on multiple UOD and known detection benchmarks, demonstrating its effectiveness and achieving state-of-the-art performance.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of **Unknown Object Detection (UOD)**. Traditional object detection methods are limited by the closed - world assumption and can only recognize objects of pre - defined categories in the training set, unable to handle unseen categories. This is a significant limitation in real - world applications because in open scenarios, the system needs to be able to recognize and locate those objects that have not appeared in the training data. #### Specific challenges: 1. **Limitations of the closed - world assumption**: Traditional detection frameworks can only recognize objects of known categories in the training set, ignoring unseen objects. 2. **Lack of effective supervision signals**: Unknown objects have no labels, so it is difficult to obtain effective supervision signals to learn their features. 3. **Deficiencies of existing methods**: Previous UOD methods usually only utilize one aspect of classification or localization information when learning objectness, resulting in poor performance, especially in terms of recall and precision. #### Solutions proposed in the paper: To overcome these challenges, the authors propose a new Transformer - based UOD framework - **UN - DETR**. The main innovations of this framework include: 1. **Instance Presence Score (IPS) under joint supervision**: - IPS is used to represent the probability of an object's existence and learns general objectness features by combining information from the position and category latent spaces. - A joint supervision strategy is introduced to ensure that supervision signals are obtained simultaneously from both the position and category dimensions, improving the model's generalization ability. 2. **One - to - one to many - to - one task assignment strategy**: - By introducing a many - to - one task assignment strategy, the number of positive samples is increased, enabling better learning of general features. 3. **Unbiased Query Selection**: - An additional IPS predictor is used to replace the original classification head to eliminate category bias and focus on the existence of objects rather than specific categories. 4. **IPS - guided post - processing strategy**: - It includes IPS - guided non - maximum suppression (NMS) and a dual - criteria unknown object discrimination protocol to filter redundant bounding boxes and further distinguish between known and unknown objects. 5. **Unsupervised pre - training**: - An unsupervised pre - training method is used, combining a region proposal generator and a self - supervised image encoder to obtain more powerful prior knowledge of objectness. ### Summary This paper solves the key problems in unknown object detection, that is, how to effectively learn and recognize unseen objects in the absence of explicit labels, by proposing the UN - DETR framework. Through the introduction of techniques such as joint supervision, many - to - one task assignment, unbiased query selection, and unsupervised pre - training, UN - DETR achieves state - of - the - art performance in multiple benchmark tests, significantly improving the effect of unknown object detection.