LVMUM: Toward Open-World Object Detection with Large Vision Models and Unsupervised Modeling

Yangyang Huang,Xing Xi,Weiye Wu,Ronghua Luo
DOI: https://doi.org/10.1007/978-981-97-5600-1_6
2024-01-01
Abstract:Open-world object detection (OWOD), as an emerging and challenging task in object detection, requires the model to have the ability to detect known and unknown objects in dynamic environments. Furthermore, it should have the capability to perform incremental learning based on newly acquired knowledge. However, current OWOD methods focus on labeling regions with high objectness scores as unknown objects. These heuristic annotation methods rely entirely on the supervision of known objects, thus leading to the issue of label bias. To solve this problem, we propose the Object Reconstruction-based Weibull Model (ORWM) method, which uses object-level semantic information for feature reconstruction to perform unsupervised modeling of the foreground and background. In the modeling process, another challenge to detecting unknown objects is the limited annotations for unknown objects. Therefore, we propose an Unsupervised Region Proposal Generation method based on SAM (SAM-URPG) to generate original pseudo labels for unknown objects and use the zero-shot ability of the large visual model to generate pseudo labels for unknown objects. Experimental results show that our proposed method significantly improves the ability to detect unknown objects on the MS-COCO dataset. It increases U-Recall by 14.0, surpassing the previous state-of-the-art (SOTA) method by 34%, re-aching 50.9 U-Recall, while maintaining competitive performance in detecting known objects. Additionally, in terms of inference speed, our method constructs the model using a pure convolutional neural network, rather than employing a dense attention mechanism. This approach surpasses the SOTA deformable DETR-based method with a speed of 9.95 FPS, while maintaining an inference speed advantage of the SOTA Faster R-CNN-based methods.
What problem does this paper attempt to address?